Skip to content

A health check, track and manage for Embedded Linux services

Notifications You must be signed in to change notification settings

0xkelvin/systemd-doctor

Repository files navigation

Overview

Systemd-doctor is a health monitoring service designed to track and manage the health of various services on an embedded Linux device.

It integrates with Systemd to automatically restart services when abnormalities are detected, making sure your custom services are working.

Additionally, Systemd-doctor stores metrics in a time-series database, allowing users to view metrics and charts. It is helpful for System Analysis when we need a comprehensive data to evaluate out custom services and resouce, good information for debugging too.

Systemd-doctor service is able to reset itself by Systemd Watchdog

Features

  • Monitors CPU load, memory usage, disk space, and service status of "services"...
  • Tracks global metrics like CPU temperature, board temperature, and network bandwidth...
  • Journal-logging for each of service and kernel log
  • Automatically restarts services if thresholds are breached.
  • Validates if the services specified for tracking are valid systemd services.
  • Stores metrics in a time-series database for visualization in Grafana.

Configuration

Tracking Services Registration

The configuration file (config.toml) allows users to specify the services to monitor and their respective thresholds. Example

[services]
list = ["ota", "mqtt-client", "can-parser", "logging"]

[thresholds.ota]
cpu = 80.0
memory = 70.0
disk = 90

[thresholds.mqtt-client]
cpu = 60.0
memory = 50.0
disk = 85

[thresholds.can-parser]
cpu = 75.0
memory = 65.0
disk = 88

[thresholds.logging]
cpu = 70.0
memory = 60.0
disk = 85

[global_thresholds]
cpu_temperature = 80.0
board_temperature = 70.0
network_bandwidth = 1000.0 

Service file for Systemd-doctor

[Unit]
Description=Doctor Viet - Health Monitoring Service
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/systemd-doctor --config=/path/to/config.toml
WatchdogSec=10
Restart=always

[Install]
WantedBy=multi-user.target

About

A health check, track and manage for Embedded Linux services

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages