Serverless Monitoring Development Framework

A framework for creating monitoring systems leveraging Amazon AWS services.

  • Configuration as Code
  • Easy Deployment
  • Cost Effective
import us.aharon.smdf.core.*

class App : Application() {

    override val checks = listOf(
            checks("example") {
                check("CPU") {
                    command     = "/nagios/plugins/check_cpu --warning 75 --critical 85"
                    interval    = 5
                    tags        = listOf("linux")
                    contacts    = listOf("devops")
                    handlers    = listOf(DefaultHandler::class)
                serverlessCheck("Database cluster health") {
                    executor = CheckDatabaseClusterHealth::class
                    interval = 15

fun main(vararg args: String) = App().run(args)
  • The monitoring application, App, runs on AWS Lambda.
  • Triggers two checks.
    • The CPU check is run every 5 minutes and will be run on clients subscribed to the linux tag.
    • The Database cluster health check is run every 15 minutes and is run in AWS Lambda.


Configuration as Code

By using a DSL written in Kotlin the compiler can check the validity of your configuration. No more pushing broken configuration files to a monitoring system configured by flat files.
The IDE can tell you which configuration options are available, which types of values they require, and show you their documentation. No more searching the internet for parameter names or configuration options.

Easy Deployment

The monitoring application compiles to a single, executable JAR file.
Run the compiled application to deploy it with your desired command-line parameters.
By using Amazon AWS services like Lambda, SNS, SQS, and DynamoDB no server/container management is required.

Cost Effective

For small to medium sized infrastructures the usage will fall under Amazon AWS's free tier.


Architecture Diagram



The Application class is the entry point for both the CLI and Serverless parts of the framework. It is an instance of Application where checks are registered or defined.

An example Application implementation:

import us.aharon.smdf.core.api.Application

class MyMonitoringApp : Application() {

    override val checks = listOf(

fun main(vararg args: String) = MyMonitoringApp().run(args)

In the example above, LINUX_CHECKS and METRIC_CHECKS are groups of checks defined elsewhere in the project.


There are two types of checks which the framework defines.

Client Checks

Client checks are run on a server instance or within a Docker container.
The LINUX_CHECKS from the Application example might be defined like so:

import us.aharon.smdf.core.api.checks
import us.aharon.smdf.core.api.check

val LINUX_CHECKS = checks("Linux System") {

    check("CPU") {
        command      = "/usr/lib64/nagios/plugins/check_cpu --warning 75 --critical 85"
        tags         = listOf("linux")
        interval     = 5 // Minutes
        handlers     = listOf(DefaultHandler::class)
        notification = "CPU usage is high"
        contacts     = listOf("[email protected]")
    check("Load") {
        command      = "/usr/lib64/nagios/plugins/check_load --warning 5,5,5 --critical 10,10,10"
        tags         = listOf("linux")
        interval     = 5
        handlers     = listOf(DefaultHandler::class)
        notification = "Load is high"
        contacts     = listOf("[email protected]")

Two checks are defined here within the Linux System check group.
They both use Nagios checks which are commonly found on Linux systems and run at five minute intervals.
On state changes (OK to WARNING, CRITICAL to OK, etc) the DefaultHandler class will be triggered to run (covered in the Notification Handler section).
The contacts here are email addresses, but that is not a requirement.

Serverless Checks

Serverless checks are run within AWS Lambda.
The METRIC_CHECKS from the Application example might perform queries on a Graphite instance:

import us.aharon.smdf.core.api.checks
import us.aharon.smdf.core.api.serverlessCheck

val METRIC_CHECKS = checks("Graphite") {

    serverlessCheck("Web Server Latency") {
        executor   = GraphiteCheckWebServerLatency::class
        interval   = 3
        additional = mapOf(
            "query"    to "avg(percentileOfSeries(webserver.*.latency, 95))",
            "from"     to "-10minutes",
            "warning"  to "100",
            "critical" to "300")

This check will run in AWS Lambda using the GraphiteCheckWebServerLatency class.
Serverless checks are implemented by extending the ServerlessExecutor abstract class.
In this example the serverless check executor expects additional metadata which is provided via the additional property.

The core concepts of creating checks consist of:

  • Command/Executor
    The shell command to run (if running on a server), or the user-defined class which will execute in the cloud.
  • Tags
    Clients running server-side subscribe to checks via a tagging system.
    For instance, if a client subscribes to the tags linux, nginx, and mysql, then it will only be instructed to perform checks that correspond to those tags.
  • Interval
    Intervals are run minutely, with one minute being the shortest interval available.
  • Handlers
    Handlers are triggered to run when a check undergoes a state change.
    For example, if the client running on the server reports that CPU usage is at a critical level, then that check's handler will be run to send a notification to the configured contacts.
  • Contacts
    Configured via the contacts property, these can be whatever is appropriate for the situation.
    In the example above we used an email address.

Default Check Properties

Many checks will share the same properties, like contacts, interval, handlers, and tags.
In order to avoid repeating these values Check Templates can be created.
Here are two example check templates. One for client/server checks, and a second for serverless checks.

import us.aharon.smdf.core.api.clientCheckTemplate
import us.aharon.smdf.core.api.serverlessCheckTemplate

val defaultClientCheck = clientCheckTemplate {
    interval = 5
    handlers = listOf(DefaultHandler::class)
    tags     = listOf("linux")
    contacts = listOf("[email protected]")

val defaultServerlessCheck = serverlessCheckTemplate {
    interval = 15
    contacts = listOf("[email protected]")

These two check templates can now be used elsewhere and will inherit the configuration properties from the templates defined above:

val EXAMPLE_TEMPLATE_CHECKS = checks("Example Template Checks") {

    defaultClientCheck("RSyslog is running") {
        command = "/usr/lib64/nagios/plugins/check_procs --critical 1:1 --command rsyslog"
    defaultServerlessCheck("Running in us-east-1") {
        executor = CheckRunningInUSEast1::class

Serverless Check Executors

The ServerlessExecutor is an abstract class which can be implemented to perform any desired check.
Instead of the check running on a server or container, it will run within AWS Lambda.
A contrived example:

import com.amazonaws.auth.AWSCredentialsProvider
import us.aharon.smdf.core.checks.*

class CheckIamUsers : ServerlessExecutor() {

    override val permissions: List<Permission> = listOf(
                    actions = listOf("iam:ListUsers"),
                    resources = listOf("*"))

    override fun run(check: ServerlessCheck, ctx: Context, credentials: AWSCredentialsProvider): Result {
        val iamClient = AmazonIdentityManagementClientBuilder.standard()
        val iamUsers = iamClient.listUsers().users
        ctx.logger.log("Found the following users:  $iamUsers")
        if (iamUsers.size > 10) {
            return Critical("CRITICAL - ${iamUsers.size} users is too many!")
        return Ok("OK - ${ { it.userName }}")

This check will run in AWS Lambda.
The permissions property defines which IAM policies it will require to perform its actions.
During deployment of the monitoring application an IAM Role will be created with those permissions.
When the check is executed, it will perform AssumeRole and retrieve IAM credentials.
The credentials for that assumed role are provided to the run function.

Notification Handlers

Notification handlers are triggered by state changes for a specific client-check pair.
If client1 runs check CPU and reports a state change from OK to CRITICAL, then the notification handler defined on the CPU check will be triggered to run.
Notification handlers are created by extending the NotificationHandler abstract class.

Here is an example which emails the check's configured contacts:

import com.amazonaws.auth.AWSCredentialsProvider
import us.aharon.smdf.core.checks.*
import us.aharon.smdf.core.db.CheckResultRecord
import us.aharon.smdf.core.handlers.NotificationHandler

class EmailNotificationHandler : NotificationHandler() {

    override val permissions: List<Permission> = listOf(
                    actions = listOf("ses:SendEmail"),
                    resources = listOf("arn:aws:ses:*:*:identity/"))

    override fun run(check: Check, checkResult: CheckResultRecord, ctx: Context, credentials: AWSCredentialsProvider) {
        val request = SendEmailRequest()
                .withSource("[email protected]")
                        Content("${checkResult.status} - ${checkResult.source} - ${check.notification}"),
                            Message:    ${check.notification}
                            Source:     ${checkResult.source}
                            Status:     ${checkResult.status}
                            Timestamp:  ${checkResult.completedAt}

        val client = AmazonSimpleEmailServiceClientBuilder.standard()
        ctx.logger.log("Sent email.")


The monitoring application is designed to compile to a single fat JAR which can deploy itself.

Usage: java -jar app.jar deploy [--security-group-ids=SECURITY_GROUP[,SECURITY_GROUP...]] [--subnet-ids=SUBNET[,SUBNET...]] [-hV] [--dry-run] -d=DEST -e=ENV [-l=LEVEL] -n=NAME -r=REGION
Deploy application to the cloud
      --dry-run           Generate and validate the CloudFormation template without
                            installing the application
  -d, --s3-dest=DEST      S3 Bucket and path as upload destination. eg.
  -e, --environment=ENV   A name given to the environment for this application (prd,
                            dev, ...)
  -h, --help              Show this help message and exit.
  -l, --log-level=LEVEL   Log level (TRACE, DEBUG, ERROR, WARN, INFO)
  -n, --stack-name=NAME   CloudFormation Stack name
  -r, --region=REGION     AWS region
  -V, --version           Print version information and exit.
                          List of security group IDs for notification and serverless
                            check functions.
                          List of subnet IDs for notification and serverless check

An example invocation of the deploy CLI command would look like this:

$ java -jar ./target/app-1.x.x.jar deploy --environment dev --s3-dest my-monitoring-bucket/dev/ --stack-name mon-dev --region us-east-1
CloudFormation template is valid.
Uploading ./target/app-1.x.x.jar
Uploading dev/cfn-template-dev-1549716815865.yaml
The `mon-dev` stack does not exist.
Template URL:
Creating the stack named `mon-dev`...
Stack `mon-dev` status:  CREATE_IN_PROGRESS
Stack `mon-dev` status:  CREATE_COMPLETE



