Alerting Principles

We manage how we get alerted based on many factors such as the customers contractual SLA, the urgency of their request or incident, etc.. an alert or notification is something which requires a human to perform an action. Based on the severity of the issue (service request or incident) we prioritize accordingly in DoIT.

Major Priority Alerts

Anything that wakes up a human in the middle of the night should be immediately human actionable. If it is none of those things, then we need to adjust the alert to not page at those times.

Priority Alerts Response
Major Major-Priority Spearhead Alert 24/7/365. Requires immediate human action.
Normal Normal-Priority Spearhead Alert during business hours only. Requires human action that same working day.
Minor Minor-Priority Spearhead Alert 24/7/365. Requires human action at some point.

Both IN and SR (incidents, service requests) share the same priorities. The actual response / resolution times vary and are based upon contractual agreements with the customer. These details (SLA) are available in DoIT on the organization page of the respective customer.

If you're setting up a new alert/notification, consider the chart above for how you want to alert people. Be mindful of not creating new high-priority alerts if they don't require an immediate response, for example.

Alert Channels

Presently we use email as the only notification method. This means keeping an eye on your email is essential! SMS and Push notifications are in the pipeline for DoIT.

Examples#

"Production service is failing for 75% of requests, automation is unable to resolve."_#

This would be a Major priority IN, requiring immediate human action to resolve.

Major Urgency

"A customer sends an email stating that "Production server disk space is filling, expected to be full in 48 hours. Log rotation is insufficient to resolve."#

This would be a Normal priority SR, requiring human action soon, but not immediately.

Normal Urgency

"An SSL certificate is due to expire in one week."#

This would be a Minor priority SR, requiring human action some time soon.

Minor Urgency