spearhead-issue-response/docs/oncall/alerting_principles.md

37 lines
2.8 KiB
Markdown
Raw Permalink Normal View History

2017-01-13 20:18:18 +02:00
We manage how we get alerted based on many factors such as the customers contractual SLA, the urgency of their request or incident, etc.. **an alert or notification is something which requires a human to perform an action**. Based on the severity of the issue (service request or incident) we prioritize accordingly in [DoIT](http://doit.sphs.ro).
!!! warning "Major Priority Alerts"
2017-08-13 20:17:52 +03:00
Anything that wakes up a human in the middle of the night should be **immediately human actionable**. If it is none of those things, then we need to adjust the alert to not bother us at those times.
2017-01-13 20:18:18 +02:00
| Priority | Alerts | Response |
| -------- | ------ | -------- |
| Major | Major-Priority Spearhead Alert 24/7/365. | Requires **immediate human action**. |
2017-01-21 15:26:04 +02:00
| Normal | Normal-Priority Alert during **business hours only**. | Requires human action that same working day. |
| Minor | Minor-Priority Alert 24/7/365. | Requires human action at some point. |
| Notification | Suppressed Events. No response required. | Informational only. We do not need these to clutter our ticketing or inboxes. If they are enabled they should be sent only to required/specific people, not groups. |
2017-01-13 20:18:18 +02:00
2017-08-13 20:17:52 +03:00
Both IN and SR (incidents, service requests) share the same priorities. The actual response / resolution times vary and are based upon contractual agreements with the customer. These details (SLA) are available in DoIT on the organization page.
2017-01-13 20:18:18 +02:00
If you're setting up a new alert/notification, consider the chart above for how you want to alert people. Be mindful of not creating new high-priority alerts if they don't require an immediate response, for example.
!!! info "Alert Channels"
2017-08-13 20:17:52 +03:00
Primarily we use email as the notification/alert methods and all of our customers are encouraged to use this method. Secondly there is the DoIT customer portal which will send alerts to the on-call person(s) and escalate based on SLA/contractual agreements. Thirdly we use our centralized support telephone number and individual phones. This means keeping an eye on your email is essential!
2017-01-13 20:18:18 +02:00
SMS and Push notifications are in the pipeline for DoIT.
## Examples
#### "Production service is failing for 75% of requests, automation is unable to resolve."_
This would be a **Major** priority IN, requiring immediate human action to resolve.
![Major Urgency](../assets/img/screenshots/prio-high.png)
#### "A customer sends an email stating that "Production server disk space is filling, expected to be full in 48 hours. Log rotation is insufficient to resolve."
This would be a **Normal** priority SR, requiring human action soon, but not immediately.
![Normal Urgency](../assets/img/screenshots/prio-norm.png)
#### "An SSL certificate is due to expire in one week."
This would be a **Minor** priority SR, requiring human action some time soon.
2017-08-13 20:17:52 +03:00
![Minor Urgency](../assets/img/screenshots/prio-low.png)