spearhead-issue-response/_site/docs/index.md

4.6 KiB

This documentation covers parts of the Spearhead Systems reponse process for technical support service requests and incidents. It is based on PagerDuty's documentation and furthermore a cut-down version of our own internal documentation, used to prepare new employees for servicing our customer requests and incidents. It provides information not only on preparing for an incident, but also what to do during and after. It is intended to be used by on-call practitioners and those involved in an operational incident response process (or those wishing to enact a formal incident response process). See the about page for more information on what this documentation is and why it exists. This documentation is complementary to what is available in our existing wiki and may not yet be public.

!!! note "Issue, Incident and Service Request" At Spearhead we use the term issue to define any request from our customers. Issues fall into two categories: "Service Requests (SR)" and "Incidents (IN)". We use the term issue to describe both a service request as well as incidents. For brevity we will use SR and IN throughout this documentation.

A "service request" is usually initiated by a human and is generally not critical for the normal functioning of the business while an "incident" is an issue that is or can cause interruption to normal business functions.

Issue Response at Spearhead Systems

Being On-Call

If you've never been on-call before, you might be wondering what it's all about. These pages describe what the expectations of being on-call are, along with some resources to help you.

  • Being On-Call - A guide to being on-call, both what your responsibilities are, and what they are not.
  • Alerting Principles - The principles we use to determine what things page an engineer, and what time of day they page.

Before an Incident

Reading material for things you probably want to know before an incident occurs. You likely don't want to be reading these during an actual incident.

  • Severity Levels - Information on our severity level classification. What constitutes a Low issue? What's a "Major Incident"?, etc.
  • Different Roles for Incidents - Information on the roles during an incident; Incident Commander, Scribe, etc.
  • Incident Call Etiquette - Our etiquette guidelines for incident calls, before you find yourself in one.

During an Incident

Information and processes during an incident.

After an Incident

Our followup processes, how we make sure we don't repeat mistakes and are always improving.

  • Post-Mortem Process - Information on our post-mortem process; what's involved and how to write or run a post-mortem.
  • Post-Mortem Template - The template we use for writing our post-mortems for major incidents.

Training

So, you want to learn about incident response? You've come to the right place.

Additional Reading

Useful material and resources from external parties that are relevant to incident response.