Automating System Monitoring with PowerShell and Python

2025-01-28

Creating custom monitoring solutions for hybrid Windows and Linux environments using scripting tools.

Automating System Monitoring with PowerShell and Python

Modern IT environments are increasingly complex, often comprising a mix of Windows and Linux systems, on-premises and cloud resources. As a systems engineer, I've found that creating custom monitoring solutions combining PowerShell and Python can provide flexibility and depth that pre-packaged solutions sometimes lack, especially in heterogeneous environments.

Why Custom Monitoring Scripts?

While enterprise monitoring platforms like Nagios, Zabbix, or PRTG offer comprehensive solutions, there are several reasons to develop custom monitoring scripts:

Specific Use Cases: Monitoring unique applications or custom services
Cost Effectiveness: Avoiding licensing costs for smaller environments
Integration Flexibility: Easier integration with existing workflows and notification systems
Tailored Alerting: Precise control over alert thresholds and conditions
Learning Opportunity: Deepening understanding of system operations and scripting

Custom scripts aren't meant to replace enterprise monitoring platforms entirely, but rather to complement them by addressing specific needs or filling gaps.

PowerShell for Windows Monitoring

PowerShell is incredibly powerful for Windows monitoring due to its deep integration with the operating system. Here's a practical example of a script I use to monitor critical Windows services:

# ServiceMonitor.ps1
param(
    [string[]]$ServiceNames = @("DHCP", "DNS", "W32Time"),
    [string]$LogPath = "C:\Logs\ServiceMonitor.log",
    [string]$EmailTo = "admin@yourdomain.com",
    [string]$SMTPServer = "smtp.yourdomain.com"
)

# Ensure log directory exists
$LogDir = Split-Path -Path $LogPath -Parent
if (-not (Test-Path -Path $LogDir)) {
    New-Item -ItemType Directory -Path $LogDir -Force | Out-Null
}

function Write-Log {
    param (
        [string]$Message
    )
    $TimeStamp = (Get-Date).ToString("yyyy-MM-dd HH:mm:ss")
    "$TimeStamp - $Message" | Out-File -FilePath $LogPath -Append
    Write-Host "$TimeStamp - $Message"
}

function Send-AlertEmail {
    param (
        [string]$Subject,
        [string]$Body
    )
    
    $EmailParams = @{
        From = "monitoring@yourdomain.com"
        To = $EmailTo
        Subject = $Subject
        Body = $Body
        SmtpServer = $SMTPServer
    }
    
    try {
        Send-MailMessage @EmailParams -BodyAsHtml
        Write-Log "Alert email sent: $Subject"
    } catch {
        Write-Log "Failed to send email alert: $_"
    }
}

# Check each service
foreach ($ServiceName in $ServiceNames) {
    try {
        $Service = Get-Service -Name $ServiceName -ErrorAction Stop
        
        if ($Service.Status -ne 'Running') {
            $Subject = "ALERT: $ServiceName is not running on $env:COMPUTERNAME"
            $Body = @"
<h2>Service Monitoring Alert</h2>
<p>The following service is not in Running state:</p>
<ul>
    <li><strong>Service Name:</strong> $ServiceName</li>
    <li><strong>Current Status:</strong> $($Service.Status)</li>
    <li><strong>Server:</strong> $env:COMPUTERNAME</li>
    <li><strong>Time:</strong> $(Get-Date)</li>
</ul>
<p>Please investigate and take appropriate action.</p>
"@
            Send-AlertEmail -Subject $Subject -Body $Body
            
            # Attempt to start the service
            Write-Log "Attempting to start $ServiceName service..."
            Start-Service -Name $ServiceName
            
            # Check if service started successfully
            $Service = Get-Service -Name $ServiceName
            if ($Service.Status -eq 'Running') {
                Write-Log "Successfully started $ServiceName service."
                Send-AlertEmail -Subject "RESOLVED: $ServiceName successfully restarted on $env:COMPUTERNAME" -Body "The $ServiceName service was successfully restarted at $(Get-Date)."
            } else {
                Write-Log "Failed to start $ServiceName service."
            }
        } else {
            Write-Log "$ServiceName is running correctly."
        }
    } catch {
        Write-Log "Error monitoring $ServiceName: $_"
        Send-AlertEmail -Subject "ERROR: Failed to monitor $ServiceName on $env:COMPUTERNAME" -Body "An error occurred while monitoring the $ServiceName service: $_"
    }
}

This script:

Checks the status of critical services
Logs the results to a file
Attempts to restart services that aren't running
Sends email alerts for both failures and successful restarts

You can easily expand this script to check other aspects of Windows systems, such as disk space, CPU usage, or event logs.

Python for Cross-Platform Monitoring

Python shines for monitoring Linux systems and creating cross-platform solutions. Here's a simple example for monitoring system resources across platforms:

#!/usr/bin/env python3
# system_monitor.py

import os
import platform
import psutil
import smtplib
import socket
import time
from email.mime.text import MIMEText
from datetime import datetime

# Configuration
CHECK_INTERVAL = 300  # seconds
CPU_THRESHOLD = 80  # percent
MEMORY_THRESHOLD = 80  # percent
DISK_THRESHOLD = 85  # percent
LOG_FILE = "/var/log/system_monitor.log" if platform.system() != "Windows" else "C:\\Logs\\system_monitor.log"
EMAIL_FROM = "monitoring@yourdomain.com"
EMAIL_TO = "admin@yourdomain.com"
SMTP_SERVER = "smtp.yourdomain.com"

# Ensure log directory exists
log_dir = os.path.dirname(LOG_FILE)
if not os.path.exists(log_dir):
    os.makedirs(log_dir)

def write_log(message):
    """Write message to log file and console"""
    timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    log_message = f"{timestamp} - {message}"
    
    print(log_message)
    with open(LOG_FILE, "a") as log_file:
        log_file.write(log_message + "\n")

def send_alert(subject, message):
    """Send email alert"""
    try:
        msg = MIMEText(message)
        msg['Subject'] = subject
        msg['From'] = EMAIL_FROM
        msg['To'] = EMAIL_TO
        
        with smtplib.SMTP(SMTP_SERVER) as server:
            server.send_message(msg)
        
        write_log(f"Alert email sent: {subject}")
    except Exception as e:
        write_log(f"Failed to send email alert: {e}")

def check_system():
    """Check system resources and alert if thresholds are exceeded"""
    hostname = socket.gethostname()
    
    # Check CPU
    cpu_percent = psutil.cpu_percent(interval=1)
    if cpu_percent > CPU_THRESHOLD:
        subject = f"ALERT: High CPU usage on {hostname}"
        message = f"CPU usage is {cpu_percent}%, which exceeds the threshold of {CPU_THRESHOLD}%."
        send_alert(subject, message)
        write_log(f"High CPU alert: {cpu_percent}%")
    else:
        write_log(f"CPU usage normal: {cpu_percent}%")
    
    # Check memory
    memory = psutil.virtual_memory()
    memory_percent = memory.percent
    if memory_percent > MEMORY_THRESHOLD:
        subject = f"ALERT: High memory usage on {hostname}"
        message = f"Memory usage is {memory_percent}%, which exceeds the threshold of {MEMORY_THRESHOLD}%."
        send_alert(subject, message)
        write_log(f"High memory alert: {memory_percent}%")
    else:
        write_log(f"Memory usage normal: {memory_percent}%")
    
    # Check disk space for each mounted disk
    for partition in psutil.disk_partitions():
        if os.name == 'nt' and 'cdrom' in partition.opts or partition.fstype == '':
            # Skip CD-ROM drives on Windows
            continue
        
        try:
            usage = psutil.disk_usage(partition.mountpoint)
            if usage.percent > DISK_THRESHOLD:
                subject = f"ALERT: Low disk space on {hostname}:{partition.mountpoint}"
                message = f"Disk space usage for {partition.mountpoint} is {usage.percent}%, which exceeds the threshold of {DISK_THRESHOLD}%."
                send_alert(subject, message)
                write_log(f"Low disk space alert: {partition.mountpoint} {usage.percent}%")
            else:
                write_log(f"Disk space normal: {partition.mountpoint} {usage.percent}%")
        except PermissionError:
            # This can happen if the disk is not ready
            write_log(f"Cannot check disk: {partition.mountpoint} (Permission denied)")

def main():
    write_log("System monitoring started")
    
    try:
        while True:
            check_system()
            time.sleep(CHECK_INTERVAL)
    except KeyboardInterrupt:
        write_log("System monitoring stopped by user")
    except Exception as e:
        write_log(f"System monitoring stopped due to error: {e}")
        send_alert("ALERT: Monitoring script failed", f"The system monitoring script on {socket.gethostname()} has stopped due to an error: {e}")

if __name__ == "__main__":
    main()

This script:

Works on both Windows and Linux
Monitors CPU, memory, and disk usage
Logs results to a file
Sends email alerts when thresholds are exceeded

Integrating PowerShell and Python

For hybrid environments, we can leverage the strengths of both languages. One approach is to use PowerShell to handle Windows-specific tasks and Python for cross-platform operations and data aggregation.

Here's a simple example of PowerShell invoking Python to create a unified monitoring solution:

# HybridMonitor.ps1
param(
    [string]$PythonScript = "C:\Scripts\cross_platform_stats.py",
    [string]$OutputFile = "C:\Monitoring\system_stats.json"
)

# Windows-specific checks
$WindowsMetrics = @{
    "ComputerName" = $env:COMPUTERNAME
    "OSVersion" = [System.Environment]::OSVersion.VersionString
    "LastBootTime" = (Get-CimInstance -ClassName Win32_OperatingSystem).LastBootUpTime
    "InstalledUpdates" = (Get-HotFix).Count
    "RunningServices" = (Get-Service | Where-Object {$_.Status -eq "Running"}).Count
    "StoppedServices" = (Get-Service | Where-Object {$_.Status -eq "Stopped"}).Count
}

# Call Python for cross-platform metrics
$PythonOutput = & python $PythonScript

# Combine results
$CombinedOutput = @{
    "WindowsSpecific" = $WindowsMetrics
    "CrossPlatform" = $PythonOutput | ConvertFrom-Json
}

# Export to JSON
$CombinedOutput | ConvertTo-Json -Depth 10 | Out-File -FilePath $OutputFile

# Upload to central monitoring system or send via email
# ...

And the corresponding Python script:

# cross_platform_stats.py
import json
import psutil
import platform
import datetime

# Collect cross-platform metrics
stats = {
    "platform": platform.system(),
    "platform_release": platform.release(),
    "platform_version": platform.version(),
    "architecture": platform.machine(),
    "hostname": platform.node(),
    "cpu_count": psutil.cpu_count(),
    "cpu_percent": psutil.cpu_percent(interval=1),
    "memory_total": psutil.virtual_memory().total,
    "memory_available": psutil.virtual_memory().available,
    "memory_percent": psutil.virtual_memory().percent,
    "disk_partitions": [{"device": p.device, "mountpoint": p.mountpoint, "fstype": p.fstype} for p in psutil.disk_partitions()],
    "network_interfaces": list(psutil.net_if_addrs().keys()),
    "boot_time": datetime.datetime.fromtimestamp(psutil.boot_time()).strftime("%Y-%m-%d %H:%M:%S"),
}

# Add disk usage for each partition
for partition in stats["disk_partitions"]:
    try:
        usage = psutil.disk_usage(partition["mountpoint"])
        partition["total_size"] = usage.total
        partition["used"] = usage.used
        partition["free"] = usage.free
        partition["percent"] = usage.percent
    except:
        partition["error"] = "Could not get disk usage"

# Print as JSON for PowerShell to capture
print(json.dumps(stats))

Building a Dashboard

With these scripts collecting data, the next step is visualizing it. A simple approach is to generate HTML reports:

# generate_dashboard.py
import json
import os
from datetime import datetime
import matplotlib.pyplot as plt
import base64
from io import BytesIO

# Load monitoring data
with open('monitoring_data.json', 'r') as f:
    data = json.load(f)

# Generate CPU usage chart
def create_cpu_chart(data):
    timestamps = [entry['timestamp'] for entry in data]
    cpu_values = [entry['cpu_percent'] for entry in data]
    
    plt.figure(figsize=(10, 4))
    plt.plot(timestamps, cpu_values)
    plt.title('CPU Usage Over Time')
    plt.ylabel('CPU %')
    plt.ylim(0, 100)
    plt.grid(True)
    
    # Convert plot to base64 for HTML embedding
    buffer = BytesIO()
    plt.savefig(buffer, format='png')
    buffer.seek(0)
    image_png = buffer.getvalue()
    buffer.close()
    
    return base64.b64encode(image_png).decode('utf-8')

# Generate memory usage chart
def create_memory_chart(data):
    # Similar implementation...
    pass

# Create HTML dashboard
def generate_html_dashboard(data, cpu_chart, memory_chart):
    html = f"""
    <!DOCTYPE html>
    <html>
    <head>
        <title>System Monitoring Dashboard</title>
        <style>
            body {{ font-family: Arial, sans-serif; margin: 0; padding: 20px; }}
            .dashboard {{ max-width: 1200px; margin: 0 auto; }}
            .card {{ background-color: #fff; border-radius: 5px; box-shadow: 0 2px 5px rgba(0,0,0,0.1);
                    padding: 20px; margin-bottom: 20px; }}
            .chart {{ margin-top: 20px; }}
            .status-ok {{ color: green; }}
            .status-warning {{ color: orange; }}
            .status-critical {{ color: red; }}
            table {{ width: 100%; border-collapse: collapse; }}
            th, td {{ padding: 8px; text-align: left; border-bottom: 1px solid #ddd; }}
            th {{ background-color: #f2f2f2; }}
        </style>
    </head>
    <body>
        <div class="dashboard">
            <h1>System Monitoring Dashboard</h1>
            <p>Last updated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}</p>
            
            <div class="card">
                <h2>System Overview</h2>
                <table>
                    <tr>
                        <th>Hostname</th>
                        <td>{data['hostname']}</td>
                    </tr>
                    <tr>
                        <th>Platform</th>
                        <td>{data['platform']} {data['platform_version']}</td>
                    </tr>
                    <tr>
                        <th>Uptime</th>
                        <td>{data['uptime']}</td>
                    </tr>
                </table>
            </div>
            
            <div class="card">
                <h2>CPU Usage</h2>
                <div class="chart">
                    <img src="data:image/png;base64,{cpu_chart}" width="100%">
                </div>
            </div>
            
            <div class="card">
                <h2>Memory Usage</h2>
                <div class="chart">
                    <img src="data:image/png;base64,{memory_chart}" width="100%">
                </div>
            </div>
            
            <!-- Additional sections for disk usage, services, etc. -->
            
        </div>
    </body>
    </html>
    """
    
    with open('monitoring_dashboard.html', 'w') as f:
        f.write(html)

# Main execution
if __name__ == "__main__":
    cpu_chart = create_cpu_chart(data)
    memory_chart = create_memory_chart(data)
    generate_html_dashboard(data, cpu_chart, memory_chart)

Scheduling the Monitoring Scripts

For Windows systems, you can use Task Scheduler to run PowerShell scripts regularly:

# Create a scheduled task
$Action = New-ScheduledTaskAction -Execute "PowerShell.exe" -Argument "-ExecutionPolicy Bypass -File C:\Scripts\ServiceMonitor.ps1"
$Trigger = New-ScheduledTaskTrigger -Daily -At 8am
$Settings = New-ScheduledTaskSettingsSet -ExecutionTimeLimit (New-TimeSpan -Hours 1) -RestartCount 3 -RestartInterval (New-TimeSpan -Minutes 5)
Register-ScheduledTask -TaskName "Service Monitoring" -Action $Action -Trigger $Trigger -Settings $Settings -RunLevel Highest -User "SYSTEM"

For Linux systems, cron is the traditional choice:

# Add to crontab
# Run system monitoring every 5 minutes
*/5 * * * * /usr/bin/python3 /path/to/system_monitor.py >> /var/log/system_monitor_cron.log 2>&1

Benefits of This Approach

Creating custom monitoring with PowerShell and Python offers several advantages:

Complete Flexibility: You control exactly what gets monitored and how
No Vendor Lock-in: Your monitoring solution isn't tied to a specific vendor
Extensibility: Easy to add new checks or customizations
Learning Value: Developing these scripts enhances your systems knowledge
Cost Efficiency: No licensing costs for smaller environments

Conclusion

While enterprise monitoring solutions have their place, complementing them with custom PowerShell and Python scripts provides deeper insights and more flexibility, especially in heterogeneous environments. The approach I've outlined allows you to:

Use PowerShell for Windows-specific monitoring
Leverage Python for cross-platform capabilities
Integrate both for comprehensive coverage
Visualize the data for easy consumption

By combining these powerful scripting languages, you can build a monitoring solution tailored to your specific environment, providing exactly the visibility you need into your systems' health and performance.