Automating System Monitoring with PowerShell and Python
Creating custom monitoring solutions for hybrid Windows and Linux environments using scripting tools.
Automating System Monitoring with PowerShell and Python
Modern IT environments are increasingly complex, often comprising a mix of Windows and Linux systems, on-premises and cloud resources. As a systems engineer, I've found that creating custom monitoring solutions combining PowerShell and Python can provide flexibility and depth that pre-packaged solutions sometimes lack, especially in heterogeneous environments.
Why Custom Monitoring Scripts?
While enterprise monitoring platforms like Nagios, Zabbix, or PRTG offer comprehensive solutions, there are several reasons to develop custom monitoring scripts:
- Specific Use Cases: Monitoring unique applications or custom services
- Cost Effectiveness: Avoiding licensing costs for smaller environments
- Integration Flexibility: Easier integration with existing workflows and notification systems
- Tailored Alerting: Precise control over alert thresholds and conditions
- Learning Opportunity: Deepening understanding of system operations and scripting
Custom scripts aren't meant to replace enterprise monitoring platforms entirely, but rather to complement them by addressing specific needs or filling gaps.
PowerShell for Windows Monitoring
PowerShell is incredibly powerful for Windows monitoring due to its deep integration with the operating system. Here's a practical example of a script I use to monitor critical Windows services:
# ServiceMonitor.ps1
param(
[string[]]$ServiceNames = @("DHCP", "DNS", "W32Time"),
[string]$LogPath = "C:\Logs\ServiceMonitor.log",
[string]$EmailTo = "admin@yourdomain.com",
[string]$SMTPServer = "smtp.yourdomain.com"
)
# Ensure log directory exists
$LogDir = Split-Path -Path $LogPath -Parent
if (-not (Test-Path -Path $LogDir)) {
New-Item -ItemType Directory -Path $LogDir -Force | Out-Null
}
function Write-Log {
param (
[string]$Message
)
$TimeStamp = (Get-Date).ToString("yyyy-MM-dd HH:mm:ss")
"$TimeStamp - $Message" | Out-File -FilePath $LogPath -Append
Write-Host "$TimeStamp - $Message"
}
function Send-AlertEmail {
param (
[string]$Subject,
[string]$Body
)
$EmailParams = @{
From = "monitoring@yourdomain.com"
To = $EmailTo
Subject = $Subject
Body = $Body
SmtpServer = $SMTPServer
}
try {
Send-MailMessage @EmailParams -BodyAsHtml
Write-Log "Alert email sent: $Subject"
} catch {
Write-Log "Failed to send email alert: $_"
}
}
# Check each service
foreach ($ServiceName in $ServiceNames) {
try {
$Service = Get-Service -Name $ServiceName -ErrorAction Stop
if ($Service.Status -ne 'Running') {
$Subject = "ALERT: $ServiceName is not running on $env:COMPUTERNAME"
$Body = @"
<h2>Service Monitoring Alert</h2>
<p>The following service is not in Running state:</p>
<ul>
<li><strong>Service Name:</strong> $ServiceName</li>
<li><strong>Current Status:</strong> $($Service.Status)</li>
<li><strong>Server:</strong> $env:COMPUTERNAME</li>
<li><strong>Time:</strong> $(Get-Date)</li>
</ul>
<p>Please investigate and take appropriate action.</p>
"@
Send-AlertEmail -Subject $Subject -Body $Body
# Attempt to start the service
Write-Log "Attempting to start $ServiceName service..."
Start-Service -Name $ServiceName
# Check if service started successfully
$Service = Get-Service -Name $ServiceName
if ($Service.Status -eq 'Running') {
Write-Log "Successfully started $ServiceName service."
Send-AlertEmail -Subject "RESOLVED: $ServiceName successfully restarted on $env:COMPUTERNAME" -Body "The $ServiceName service was successfully restarted at $(Get-Date)."
} else {
Write-Log "Failed to start $ServiceName service."
}
} else {
Write-Log "$ServiceName is running correctly."
}
} catch {
Write-Log "Error monitoring $ServiceName: $_"
Send-AlertEmail -Subject "ERROR: Failed to monitor $ServiceName on $env:COMPUTERNAME" -Body "An error occurred while monitoring the $ServiceName service: $_"
}
}
This script:
- Checks the status of critical services
- Logs the results to a file
- Attempts to restart services that aren't running
- Sends email alerts for both failures and successful restarts
You can easily expand this script to check other aspects of Windows systems, such as disk space, CPU usage, or event logs.
Python for Cross-Platform Monitoring
Python shines for monitoring Linux systems and creating cross-platform solutions. Here's a simple example for monitoring system resources across platforms:
#!/usr/bin/env python3
# system_monitor.py
import os
import platform
import psutil
import smtplib
import socket
import time
from email.mime.text import MIMEText
from datetime import datetime
# Configuration
CHECK_INTERVAL = 300 # seconds
CPU_THRESHOLD = 80 # percent
MEMORY_THRESHOLD = 80 # percent
DISK_THRESHOLD = 85 # percent
LOG_FILE = "/var/log/system_monitor.log" if platform.system() != "Windows" else "C:\\Logs\\system_monitor.log"
EMAIL_FROM = "monitoring@yourdomain.com"
EMAIL_TO = "admin@yourdomain.com"
SMTP_SERVER = "smtp.yourdomain.com"
# Ensure log directory exists
log_dir = os.path.dirname(LOG_FILE)
if not os.path.exists(log_dir):
os.makedirs(log_dir)
def write_log(message):
"""Write message to log file and console"""
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
log_message = f"{timestamp} - {message}"
print(log_message)
with open(LOG_FILE, "a") as log_file:
log_file.write(log_message + "\n")
def send_alert(subject, message):
"""Send email alert"""
try:
msg = MIMEText(message)
msg['Subject'] = subject
msg['From'] = EMAIL_FROM
msg['To'] = EMAIL_TO
with smtplib.SMTP(SMTP_SERVER) as server:
server.send_message(msg)
write_log(f"Alert email sent: {subject}")
except Exception as e:
write_log(f"Failed to send email alert: {e}")
def check_system():
"""Check system resources and alert if thresholds are exceeded"""
hostname = socket.gethostname()
# Check CPU
cpu_percent = psutil.cpu_percent(interval=1)
if cpu_percent > CPU_THRESHOLD:
subject = f"ALERT: High CPU usage on {hostname}"
message = f"CPU usage is {cpu_percent}%, which exceeds the threshold of {CPU_THRESHOLD}%."
send_alert(subject, message)
write_log(f"High CPU alert: {cpu_percent}%")
else:
write_log(f"CPU usage normal: {cpu_percent}%")
# Check memory
memory = psutil.virtual_memory()
memory_percent = memory.percent
if memory_percent > MEMORY_THRESHOLD:
subject = f"ALERT: High memory usage on {hostname}"
message = f"Memory usage is {memory_percent}%, which exceeds the threshold of {MEMORY_THRESHOLD}%."
send_alert(subject, message)
write_log(f"High memory alert: {memory_percent}%")
else:
write_log(f"Memory usage normal: {memory_percent}%")
# Check disk space for each mounted disk
for partition in psutil.disk_partitions():
if os.name == 'nt' and 'cdrom' in partition.opts or partition.fstype == '':
# Skip CD-ROM drives on Windows
continue
try:
usage = psutil.disk_usage(partition.mountpoint)
if usage.percent > DISK_THRESHOLD:
subject = f"ALERT: Low disk space on {hostname}:{partition.mountpoint}"
message = f"Disk space usage for {partition.mountpoint} is {usage.percent}%, which exceeds the threshold of {DISK_THRESHOLD}%."
send_alert(subject, message)
write_log(f"Low disk space alert: {partition.mountpoint} {usage.percent}%")
else:
write_log(f"Disk space normal: {partition.mountpoint} {usage.percent}%")
except PermissionError:
# This can happen if the disk is not ready
write_log(f"Cannot check disk: {partition.mountpoint} (Permission denied)")
def main():
write_log("System monitoring started")
try:
while True:
check_system()
time.sleep(CHECK_INTERVAL)
except KeyboardInterrupt:
write_log("System monitoring stopped by user")
except Exception as e:
write_log(f"System monitoring stopped due to error: {e}")
send_alert("ALERT: Monitoring script failed", f"The system monitoring script on {socket.gethostname()} has stopped due to an error: {e}")
if __name__ == "__main__":
main()
This script:
- Works on both Windows and Linux
- Monitors CPU, memory, and disk usage
- Logs results to a file
- Sends email alerts when thresholds are exceeded
Integrating PowerShell and Python
For hybrid environments, we can leverage the strengths of both languages. One approach is to use PowerShell to handle Windows-specific tasks and Python for cross-platform operations and data aggregation.
Here's a simple example of PowerShell invoking Python to create a unified monitoring solution:
# HybridMonitor.ps1
param(
[string]$PythonScript = "C:\Scripts\cross_platform_stats.py",
[string]$OutputFile = "C:\Monitoring\system_stats.json"
)
# Windows-specific checks
$WindowsMetrics = @{
"ComputerName" = $env:COMPUTERNAME
"OSVersion" = [System.Environment]::OSVersion.VersionString
"LastBootTime" = (Get-CimInstance -ClassName Win32_OperatingSystem).LastBootUpTime
"InstalledUpdates" = (Get-HotFix).Count
"RunningServices" = (Get-Service | Where-Object {$_.Status -eq "Running"}).Count
"StoppedServices" = (Get-Service | Where-Object {$_.Status -eq "Stopped"}).Count
}
# Call Python for cross-platform metrics
$PythonOutput = & python $PythonScript
# Combine results
$CombinedOutput = @{
"WindowsSpecific" = $WindowsMetrics
"CrossPlatform" = $PythonOutput | ConvertFrom-Json
}
# Export to JSON
$CombinedOutput | ConvertTo-Json -Depth 10 | Out-File -FilePath $OutputFile
# Upload to central monitoring system or send via email
# ...
And the corresponding Python script:
# cross_platform_stats.py
import json
import psutil
import platform
import datetime
# Collect cross-platform metrics
stats = {
"platform": platform.system(),
"platform_release": platform.release(),
"platform_version": platform.version(),
"architecture": platform.machine(),
"hostname": platform.node(),
"cpu_count": psutil.cpu_count(),
"cpu_percent": psutil.cpu_percent(interval=1),
"memory_total": psutil.virtual_memory().total,
"memory_available": psutil.virtual_memory().available,
"memory_percent": psutil.virtual_memory().percent,
"disk_partitions": [{"device": p.device, "mountpoint": p.mountpoint, "fstype": p.fstype} for p in psutil.disk_partitions()],
"network_interfaces": list(psutil.net_if_addrs().keys()),
"boot_time": datetime.datetime.fromtimestamp(psutil.boot_time()).strftime("%Y-%m-%d %H:%M:%S"),
}
# Add disk usage for each partition
for partition in stats["disk_partitions"]:
try:
usage = psutil.disk_usage(partition["mountpoint"])
partition["total_size"] = usage.total
partition["used"] = usage.used
partition["free"] = usage.free
partition["percent"] = usage.percent
except:
partition["error"] = "Could not get disk usage"
# Print as JSON for PowerShell to capture
print(json.dumps(stats))
Building a Dashboard
With these scripts collecting data, the next step is visualizing it. A simple approach is to generate HTML reports:
# generate_dashboard.py
import json
import os
from datetime import datetime
import matplotlib.pyplot as plt
import base64
from io import BytesIO
# Load monitoring data
with open('monitoring_data.json', 'r') as f:
data = json.load(f)
# Generate CPU usage chart
def create_cpu_chart(data):
timestamps = [entry['timestamp'] for entry in data]
cpu_values = [entry['cpu_percent'] for entry in data]
plt.figure(figsize=(10, 4))
plt.plot(timestamps, cpu_values)
plt.title('CPU Usage Over Time')
plt.ylabel('CPU %')
plt.ylim(0, 100)
plt.grid(True)
# Convert plot to base64 for HTML embedding
buffer = BytesIO()
plt.savefig(buffer, format='png')
buffer.seek(0)
image_png = buffer.getvalue()
buffer.close()
return base64.b64encode(image_png).decode('utf-8')
# Generate memory usage chart
def create_memory_chart(data):
# Similar implementation...
pass
# Create HTML dashboard
def generate_html_dashboard(data, cpu_chart, memory_chart):
html = f"""
<!DOCTYPE html>
<html>
<head>
<title>System Monitoring Dashboard</title>
<style>
body {{ font-family: Arial, sans-serif; margin: 0; padding: 20px; }}
.dashboard {{ max-width: 1200px; margin: 0 auto; }}
.card {{ background-color: #fff; border-radius: 5px; box-shadow: 0 2px 5px rgba(0,0,0,0.1);
padding: 20px; margin-bottom: 20px; }}
.chart {{ margin-top: 20px; }}
.status-ok {{ color: green; }}
.status-warning {{ color: orange; }}
.status-critical {{ color: red; }}
table {{ width: 100%; border-collapse: collapse; }}
th, td {{ padding: 8px; text-align: left; border-bottom: 1px solid #ddd; }}
th {{ background-color: #f2f2f2; }}
</style>
</head>
<body>
<div class="dashboard">
<h1>System Monitoring Dashboard</h1>
<p>Last updated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}</p>
<div class="card">
<h2>System Overview</h2>
<table>
<tr>
<th>Hostname</th>
<td>{data['hostname']}</td>
</tr>
<tr>
<th>Platform</th>
<td>{data['platform']} {data['platform_version']}</td>
</tr>
<tr>
<th>Uptime</th>
<td>{data['uptime']}</td>
</tr>
</table>
</div>
<div class="card">
<h2>CPU Usage</h2>
<div class="chart">
<img src="data:image/png;base64,{cpu_chart}" width="100%">
</div>
</div>
<div class="card">
<h2>Memory Usage</h2>
<div class="chart">
<img src="data:image/png;base64,{memory_chart}" width="100%">
</div>
</div>
<!-- Additional sections for disk usage, services, etc. -->
</div>
</body>
</html>
"""
with open('monitoring_dashboard.html', 'w') as f:
f.write(html)
# Main execution
if __name__ == "__main__":
cpu_chart = create_cpu_chart(data)
memory_chart = create_memory_chart(data)
generate_html_dashboard(data, cpu_chart, memory_chart)
Scheduling the Monitoring Scripts
For Windows systems, you can use Task Scheduler to run PowerShell scripts regularly:
# Create a scheduled task
$Action = New-ScheduledTaskAction -Execute "PowerShell.exe" -Argument "-ExecutionPolicy Bypass -File C:\Scripts\ServiceMonitor.ps1"
$Trigger = New-ScheduledTaskTrigger -Daily -At 8am
$Settings = New-ScheduledTaskSettingsSet -ExecutionTimeLimit (New-TimeSpan -Hours 1) -RestartCount 3 -RestartInterval (New-TimeSpan -Minutes 5)
Register-ScheduledTask -TaskName "Service Monitoring" -Action $Action -Trigger $Trigger -Settings $Settings -RunLevel Highest -User "SYSTEM"
For Linux systems, cron is the traditional choice:
# Add to crontab
# Run system monitoring every 5 minutes
*/5 * * * * /usr/bin/python3 /path/to/system_monitor.py >> /var/log/system_monitor_cron.log 2>&1
Benefits of This Approach
Creating custom monitoring with PowerShell and Python offers several advantages:
- Complete Flexibility: You control exactly what gets monitored and how
- No Vendor Lock-in: Your monitoring solution isn't tied to a specific vendor
- Extensibility: Easy to add new checks or customizations
- Learning Value: Developing these scripts enhances your systems knowledge
- Cost Efficiency: No licensing costs for smaller environments
Conclusion
While enterprise monitoring solutions have their place, complementing them with custom PowerShell and Python scripts provides deeper insights and more flexibility, especially in heterogeneous environments. The approach I've outlined allows you to:
- Use PowerShell for Windows-specific monitoring
- Leverage Python for cross-platform capabilities
- Integrate both for comprehensive coverage
- Visualize the data for easy consumption
By combining these powerful scripting languages, you can build a monitoring solution tailored to your specific environment, providing exactly the visibility you need into your systems' health and performance.