Status Page Reference

This is the reference implementation to generate status pages from Opsgenie alerts. The solution uses Opsgenie Webhook Integration on alert actions (Create, Close, AddTags, RemoveTags, etc.) to trigger code that is used to calculate and update the state of your services according to the severity of the alerts. To implement such a solution, the following components would be needed:

  • An environment to execute code
  • An environment to trigger code execution as mentioned above
  • Storage -- to keep the service data
  • A webserver to host the status page
  • Webhook Integration in Opsgenie to pass alert data

The reference implementation described here leverages AWS API Gateway, Lambda and S3 services.

1200

Alert Design

The solution assumes that the alerts that should be used to create and update status pages will have the following:

  • “statuspage”. Only alerts that have this tag will be processed
  • name of the service. The name of the impacted service will be included as a tag with the “servicename:” prefix. For example, “servicename:website”
  • severity of the incident. Solution supports different levels of severities for the incidents. Alert should have a tag indicating the severity and “green”, “major” and “critical” are the supported severity levels.

Service State Calculator Lambda Function

AWS Lambda is is the new compute service from AWS that lets you run your code without provisioning or managing any servers, practically with zero administration. and as you pay only for the time you consume, it’s practically free for these type of applications. Just upload your code and Lambda takes care of everything required to run and scale your code with high availability.

Lambda service executes code that is referred as “functions” triggered by events. The function is executed each time it is triggered and can be triggered by different sources such as scheduled events, API Gateway, etc.

In this implementation, the Lambda function is triggered by Opsgenie through AWS API Gateway, when the alert is created and updated in Opsgenie. The Lambda function will receive Opsgenie alert updates (Create, Close, AddTags, RemoveTags, etc.) via the Opsgenie Webhook integration, and calculate the state of the state with the following logic described below.

If alert action executed in Opsgenie is “Create” or “AddTags”:

  1. get the current state of the service
  2. compare it with the alert's severity
  3. if the alert severity is higher than the current service state -update service state

If alert action executed in Opsgenie is “Close” or “RemoveTags”:

  1. retrieve remaining service alerts through Alert API.
  2. check every alert's severity tag and set the highest severity as services state.
  3. update the service state.

Lambda function is configured in config.js:

var config = {
    "opsgenieApiKey": "", // the API key of the API Integration in Opsgenie, which will be used to retrieve service's incident alerts.
    "serviceObjectBucket" : "", // name of the S3 bucket that stores the service data
    "statusPageTag": "statuspage", // identifier tag for the statuspage incidents
    "accessKeyId": "", // accessKeyId of the user credentials which has fullAccessRight to S3
    "secretAccessKey": "", // secretAccessKey of the user credentials which has fullAccessRight to S3
    "serviceNameTagPrefix": "servicename:", // tag prefix for service names in statuspage alerts.
    // (ex: servicename:service1 -> service name will be extracted as service1)
    "serviceDataFolderNameInS3": "services" //folder of the service objects in S3 bucket
};

AWS API Gateway to trigger Lambda Functions

AWS API Gateway is a service that grants you the ability to easily create, publish and maintain APIs, and can provide a URL endpoint to trigger Lambda functions. This solution uses three Lambda functions: one for calculating service state, one for retrieving incident alerts and alert notes from Opsgenie and one for retrieving service state data from S3. After creating the Lambda functions -- create an API endpoint for each of them. Please make sure that you enabled CORS for each API source.

To secure these Lambda functions, the API Key should be used in API Gateway. Once an API Gateway is enabled to use the API Key, every HTTP/HTTPS call made to these endpoints must contain the API Key as part of the request. It also must be sent in X-Api-Key header.

Storing service data in S3

AWS S3 is a secure and scalable object storage service. When the Service State Calculator Lambda function triggered through API Gateway, Lambda function calculates new service state and puts service data to AWS S3 bucket.

Service data is a JSON file that contains service information with the following format:

{
    "state": "critical",
    "message": " API Outage."
}

Service Data Getter from S3 Lambda function

It retrieves the specific data for a service or all services from S3. It is then triggered via the API Gateway endpoint by the Status Page web application. It should be configured from config.js

var config = {
    "serviceObjectBucket" : "", // name of the S3 bucket that stores the service data
    "accessKeyId": "", // accessKeyId of the user credentials which has fullAccessRight to S3
    "secretAccessKey": "", // secretAccessKey of the user credentials which has fullAccessRight to S3
    "serviceDataFolderNameInS3": "services" //folder of the service objects in S3 bucket
};

Lambda Function for Incident Alert and Alert Note retrieval

The Lambda function retrieves a given service's alerts and alert notes from Opsgenie through the Alert API. It gets triggered by its API Gateway endpoint via the Status Page web application. It should be configured from config.js

var config = {
    "opsgenieApiKey": "", // the API key of the API Integration in Opsgenie, which will be used to retrieve service's incident alerts.
    "statusPageTag": "statuspage", // identifier tag for the statuspage incidents
    "serviceNameTagPrefix": "servicename:" // tag prefix for service names in statuspage alerts.
    //(ex: servicename:service1 -> service name will be extracted as service1)
};

Configuring Webhook Integrations in Opsgenie

Opsgenie sends alert data to custom URL endpoints via the Webhook Integration. We'll use the API Gateway URL of Service State Calculator Lambda function as Webhook URL in the integration. Again the API Key of the endpoints should be defined in the custom headers, but do not send irrelevant alert data to the Lambda function (non-incident alerts). The Opsgenie Webhook Integrations should contains some filters. In this implementation we send incident alerts' data to Service State Calculator Lambda function on Create, Close, AddTags and RemoveTags actions. To do this we needed two Webhook Integrations with the following conditions (both of them will use the same URL):

800 800

Hosting Status Page in S3

AWS S3 has Public Website hosting capabilities and status pages for the services can be hosted on S3. Status Page web application has a dashboard that lists service names and their states. From the dashboard, you can open a “services” status page and view incidents with updates (alerts and alert notes). Status pages highlight the service state and service incidents (alerts). The web application retrieves the data by making ajax calls to two separate Lambda functions through the API Gateway endpoints. The first Lambda function retrieves service data from S3 and the second one retrieves service alerts and alert notes from Opsgenie through the Alert API . The following configurations should be made in app.js

.constant('cfg', {
    'apiLambdaFunctionUrl': 'api_lambda_url', // API Gateway URL of the Lambda function that retrieves alerts and alert notes from OpsGenie.
    'serviceLambdaFunctionUrl': 'service_lambda_url', // API Gateway URL of the Lambda function that retrieves service data from S3.
    'apiKey': 'apiKey' // API Key to make secure calls to API Gateway (both API Gateway endpoint can use same API Key.)
})

The images below demonstrate examples of incident alerts in Opsgenie and corresponding status page.

In the example below, there are two incident alerts for the API service. They both contain incident alert identifier statuspage tag, a severity tag, and a tag that contains the name of the service with the servicename as the prefix servicename:API

800

Since there’s only one service in this example, dashboard contains only the API service. The severity of one of the alerts in Opsgenie is critical, therefore the status of the service is now critical and represented with the color red.

800

Status page of the API service shows the service state on top of the page and lists all incident alerts; it shows closed alerts as resolved.

773

Source Code

The source code of all three Lambda functions and Web application is available on GitHub.