spearhead-issue-response/docs/index.md

59 lines
4.8 KiB
Markdown
Raw Normal View History

2017-01-13 20:18:18 +02:00
This documentation covers parts of the Spearhead Systems Issue Response process. It is a copy of [PagerDuty's](https://github.com/PagerDuty/incident-response-docs/) documentation and furthermore a cut-down version of our own internal documentation, used at Spearhead Systems for any issue (incident or service request), and to prepare new employees for on-call responsibilities. It provides information not only on preparing for an incident, but also what to do during and after. It is intended to be used by on-call practitioners and those involved in an operational incident response process (or those wishing to enact a formal incident response process). See the [about page](about.md) for more information on what this documentation is and why it exists. This documentation is complementary to what is available in our [existing wiki](https://sphsys.sharepoint.com) and may not yet be open sourced.
!!! note "Issue, Incident and Service Request"
At Spearhead we use the term *issue* to define any request from our customers. Issues fall into two categories: "Service Requests (SR)" and "Incidents (IN)". Note that we use the term Incident to describe both a service request as well as incidents. For brevity we will use SR and IN throughout this documentation.
A "service request" is usually initiated by a human and is generally not critical for the normal functioning of the business while an "incident" is an issue that is or can cause interruption to normal business functions.
![Issue Response at Spearhead Systems](./assets/img/headers/sph_ir.jpg)
## Being On-Call
If you've never been on-call before, you might be wondering what it's all about. These pages describe what the expectations of being on-call are, along with some resources to help you.
* [Being On-Call](oncall/being_oncall.md) - _A guide to being on-call, both what your responsibilities are, and what they are not._
* [Alerting Principles](oncall/alerting_principles.md) - _The principles we use to determine what things page an engineer, and what time of day they page._
## Before an Incident
Reading material for things you probably want to know before an incident occurs. You likely don't want to be reading these during an actual incident.
* [Severity Levels](before/severity_levels.md) - _Information on our severity level classification. What constitutes a Low issue? What's a "Major Incident"?, etc._
* [Different Roles for Incidents](before/different_roles.md) - _Information on the roles during an incident; Incident Commander, Scribe, etc._
* [Incident Call Etiquette](before/call_etiquette.md) - _Our etiquette guidelines for incident calls, before you find yourself in one._
## During an Incident
Information and processes during an incident.
* [During an Incident](during/during_an_incident.md) - _Information on what to do during an incident, and how to constructively contribute._
* [Security Incident Response](during/security_incident_response.md) - _Security incidents are handled differently to normal operational incidents._
## After an Incident
Our followup processes, how we make sure we don't repeat mistakes and are always improving.
* [Post-Mortem Process](after/post_mortem_process.md) - _Information on our post-mortem process; what's involved and how to write or run a post-mortem._
* [Post-Mortem Template](after/post_mortem_template.md) - _The template we use for writing our post-mortems for major incidents._
## Training
So, you want to learn about incident response? You've come to the right place.
* [Training Overview](training/overview.md) - _An overview of our training guides and additional training material from third-parties._
* [Incident Commander Training](training/incident_commander.md) - _A guide to becoming our next Incident Commander._
* [Deputy Training](training/deputy.md) - _How to be a deputy and back up the Incident Commander._
* [Scribe Training](training/scribe.md) - _A guide to scribing._
* [Subject Matter Expert Training](training/subject_matter_expert.md) - _A guide on responsibilities and behavior for all participants in a major incident._
* [Glossary of Incident Response Terms](training/glossary.md) - _A collection of terms that you may hear being used, along with their definition._
## Additional Reading
Useful material and resources from external parties that are relevant to incident response.
* [Incident Management for Operations](http://shop.oreilly.com/product/0636920036159.do) (O'Reilly)
* [Incident Response](http://shop.oreilly.com/product/9780596001308.do) (O'Reilly)
* [Debriefing Facilitation Guide](http://extfiles.etsy.com/DebriefingFacilitationGuide.pdf) (Etsy)
* [US National Incident Management System (NIMS)](https://www.fema.gov/national-incident-management-system) (FEMA)
* [Every Minute Counts: Leading Heroku's Incident Response](https://www.heavybit.com/library/video/every-minute-counts-coordinating-herokus-incident-response/) (Blake Gentry)