This documentation covers parts of the Spearhead Systems Issue Response process. It is a copy of [PagerDuty's](https://github.com/PagerDuty/incident-response-docs/) documentation and furthermore a cut-down version of our own internal documentation, used at Spearhead Systems for any issue (incident or service request), and to prepare new employees for on-call responsibilities. It provides information not only on preparing for an incident, but also what to do during and after. It is intended to be used by on-call practitioners and those involved in an operational incident response process (or those wishing to enact a formal incident response process). See the [about page](about.md) for more information on what this documentation is and why it exists. This documentation is complementary to what is available in our [existing wiki](https://sphsys.sharepoint.com) and may not yet be open sourced. !!! note "Issue, Incident and Service Request" At Spearhead we use the term *issue* to define any request from our customers. Issues fall into two categories: "Service Requests (SR)" and "Incidents (IN)". Note that we use the term Incident to describe both a service request as well as incidents. For brevity we will use SR and IN throughout this documentation. A "service request" is usually initiated by a human and is generally not critical for the normal functioning of the business while an "incident" is an issue that is or can cause interruption to normal business functions. ![Issue Response at Spearhead Systems](./assets/img/headers/sph_ir.jpg) ## Being On-Call If you've never been on-call before, you might be wondering what it's all about. These pages describe what the expectations of being on-call are, along with some resources to help you. * [Being On-Call](oncall/being_oncall.md) - _A guide to being on-call, both what your responsibilities are, and what they are not._ * [Alerting Principles](oncall/alerting_principles.md) - _The principles we use to determine what things page an engineer, and what time of day they page._ ## Before an Incident Reading material for things you probably want to know before an incident occurs. You likely don't want to be reading these during an actual incident. * [Severity Levels](before/severity_levels.md) - _Information on our severity level classification. What constitutes a Low issue? What's a "Major Incident"?, etc._ * [Different Roles for Incidents](before/different_roles.md) - _Information on the roles during an incident; Incident Commander, Scribe, etc._ * [Incident Call Etiquette](before/call_etiquette.md) - _Our etiquette guidelines for incident calls, before you find yourself in one._ ## During an Incident Information and processes during an incident. * [During an Incident](during/during_an_incident.md) - _Information on what to do during an incident, and how to constructively contribute._ * [Security Incident Response](during/security_incident_response.md) - _Security incidents are handled differently to normal operational incidents._ ## After an Incident Our followup processes, how we make sure we don't repeat mistakes and are always improving. * [Post-Mortem Process](after/post_mortem_process.md) - _Information on our post-mortem process; what's involved and how to write or run a post-mortem._ * [Post-Mortem Template](after/post_mortem_template.md) - _The template we use for writing our post-mortems for major incidents._ ## Training So, you want to learn about incident response? You've come to the right place. * [Training Overview](training/overview.md) - _An overview of our training guides and additional training material from third-parties._ * [Incident Commander Training](training/incident_commander.md) - _A guide to becoming our next Incident Commander._ * [Deputy Training](training/deputy.md) - _How to be a deputy and back up the Incident Commander._ * [Scribe Training](training/scribe.md) - _A guide to scribing._ * [Subject Matter Expert Training](training/subject_matter_expert.md) - _A guide on responsibilities and behavior for all participants in a major incident._ * [Glossary of Incident Response Terms](training/glossary.md) - _A collection of terms that you may hear being used, along with their definition._ ## Additional Reading Useful material and resources from external parties that are relevant to incident response. * [Incident Management for Operations](http://shop.oreilly.com/product/0636920036159.do) (O'Reilly) * [Incident Response](http://shop.oreilly.com/product/9780596001308.do) (O'Reilly) * [Debriefing Facilitation Guide](http://extfiles.etsy.com/DebriefingFacilitationGuide.pdf) (Etsy) * [US National Incident Management System (NIMS)](https://www.fema.gov/national-incident-management-system) (FEMA) * [Every Minute Counts: Leading Heroku's Incident Response](https://www.heavybit.com/library/video/every-minute-counts-coordinating-herokus-incident-response/) (Blake Gentry)