spearhead-issue-response/docs/index.md

57 lines
4.6 KiB
Markdown

This documentation covers parts of the Spearhead Systems Incident Response process. It is a copy of [PagerDuty's](https://github.com/PagerDuty/incident-response-docs/) documentation and furthermore a cut-down version of our own internal documentation, used at Spearhead Systems for any issue (incident or service request), and to prepare new employees for on-call responsibilities. It provides information not only on preparing for an incident or service request, but also what to do during and after. It is intended to be used by on-call practitioners and those involved in our operational technical support response process (or those wishing to become part of our support team). See the [about page](about.md) for more information on what this documentation is and why it exists. This documentation is complementary to what is available in our [existing wiki](https://sphsys.sharepoint.com) that may not yet be open sourced.
!!! note "Issue, Incident and Service Request"
At Spearhead we use the term *issue* to define any request from our customers. Issues fall into two categories: "Service Requests (SR)" and "Incidents (IN)". An IN will generally be an issue that has impact on the normal functioning of the business while a SR generally does not.
![Incident Response at Spearhead Systems](./assets/img/headers/sph_ir.jpg)
## Being On-Call
If you've never been on-call before or part of a support delivery team, you might be wondering what it's all about. These pages describe what the expectations are, along with some resources to help you.
* [Being On-Call](oncall/being_oncall.md) - _A guide to being on-call, both what your responsibilities are, and what they are not._
* [Alerting Principles](oncall/alerting_principles.md) - _The principles we use to determine what things notify an engineer, and what time of day they do so._
## Before an Incident
Reading material for things you probably want to know before an incident occurs. You likely don't want to be reading these during an actual incident.
* [Severity Levels](before/severity_levels.md) - _Information on our severity level classification. What constitutes a Low issue? What's a "Major Incident"?, etc._
* [Different Roles for Incidents](before/different_roles.md) - _Information on the roles during an incident; Incident Commander, Scribe, etc._
* [Incident Call Etiquette](before/call_etiquette.md) - _Our etiquette guidelines for incident calls, before you find yourself in one._
## During an Incident
Information and processes during an incident.
* [During an Incident](during/during_an_incident.md) - _Information on what to do during an incident, and how to constructively contribute._
* [Security Incident Response](during/security_incident_response.md) - _Security incidents are handled differently to normal operational incidents._
## After an Incident
Our followup processes, how we make sure we don't repeat mistakes and are always improving.
* [Post-Mortem Process](after/post_mortem_process.md) - _Information on our post-mortem process; what's involved and how to write or run a post-mortem._
* [Post-Mortem Template](after/post_mortem_template.md) - _The template we use for writing our post-mortems for major incidents._
## Training
So, you want to learn about incident response? You've come to the right place.
* [Training Overview](training/overview.md) - _An overview of our training guides and additional training material from third-parties._
* [Incident Commander Training](training/incident_commander.md) - _A guide to becoming our next Incident Commander._
* [Deputy Training](training/deputy.md) - _How to be a deputy and back up the Incident Commander._
* [Scribe Training](training/scribe.md) - _A guide to scribing._
* [Subject Matter Expert Training](training/subject_matter_expert.md) - _A guide on responsibilities and behavior for all participants in a major incident._
* [Glossary of Incident Response Terms](training/glossary.md) - _A collection of terms that you may hear being used, along with their definition._
## Additional Reading
Useful material and resources from external parties that are relevant to incident response.
* [Incident Management for Operations](http://shop.oreilly.com/product/0636920036159.do) (O'Reilly)
* [Incident Response](http://shop.oreilly.com/product/9780596001308.do) (O'Reilly)
* [Debriefing Facilitation Guide](http://extfiles.etsy.com/DebriefingFacilitationGuide.pdf) (Etsy)
* [US National Incident Management System (NIMS)](https://www.fema.gov/national-incident-management-system) (FEMA)
* [Every Minute Counts: Leading Heroku's Incident Response](https://www.heavybit.com/library/video/every-minute-counts-coordinating-herokus-incident-response/) (Blake Gentry)