spearhead-issue-response/after/post_mortem_template/index.html

670 lines
21 KiB
HTML

<!DOCTYPE html>
<!--[if lt IE 7 ]><html class="no-js ie6"><![endif]-->
<!--[if IE 7 ]><html class="no-js ie7"><![endif]-->
<!--[if IE 8 ]><html class="no-js ie8"><![endif]-->
<!--[if IE 9 ]><html class="no-js ie9"><![endif]-->
<!--[if (gt IE 9)|!(IE)]><!--> <html class="no-js" lang="en"> <!--<![endif]-->
<head>
<meta charset="utf-8">
<title>Post-Mortem Template - Spearhead Systems Incident Response Documentation</title>
<!-- Author and License -->
<meta name="author" content="Spearhead Systems, Inc." />
<meta name="dcterms.license" content="http://www.apache.org/licenses/LICENSE-2.0" />
<!-- Page Description -->
<meta name="keywords" content="spearhead, incident, response" />
<meta name="robots" content="index, follow, noarchive" />
<!-- Mobile -->
<meta name="viewport" content="width=device-width, initial-scale=1.0, minimum-scale=1.0" />
<meta name="theme-color" content="#1f293a" />
<!-- Canonical Link -->
<link rel="canonical" href="https://response.spearhead.systems/after/post_mortem_template/">
<!-- Favicon -->
<link rel="shortcut icon" type="image/x-icon" href="../../assets/img/icon.png" />
<link rel="icon" type="image/x-icon" href="../../assets/img/icon.png" />
<!-- Apple -->
<meta name="apple-mobile-web-app-title" content="Spearhead Systems Incident Response Documentation" />
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<link rel="apple-touch-icon" href="../../assets/img/icon.png">
<!-- Open Graph -->
<meta property="og:url" content="https://response.spearhead.systems/after/post_mortem_template/" />
<meta property="og:title" content="Post-Mortem Template - Spearhead Systems Incident Response Documentation" />
<meta property="og:site_name" content="Spearhead Systems Incident Response Documentation" />
<meta property="og:description" content="A collection of information about the Spearhead Systems incident response process. Not only how to prepare new employees for on-call responsibilities, but also how to handle major incidents, both in preparation and after-work." />
<meta property="og:image" content="https://response.spearhead.systems/assets/img/cover.png" />
<meta property="og:locale" content="en_US" />
<meta property="og:type" content="website" />
<!-- Twitter -->
<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:title" content="Post-Mortem Template - Spearhead Systems Incident Response Documentation" />
<meta name="twitter:description" content="A collection of information about the Spearhead Systems incident response process. Not only how to prepare new employees for on-call responsibilities, but also how to handle major incidents, both in preparation and after-work." />
<meta name="twitter:image" content="https://response.spearhead.systems/assets/img/cover.png" />
<!-- Style -->
<style>
@font-face {
font-family: 'Icon';
src: url('../../assets/fonts/icon.eot?52m981');
src: url('../../assets/fonts/icon.eot?#iefix52m981')
format('embedded-opentype'),
url('../../assets/fonts/icon.woff?52m981')
format('woff'),
url('../../assets/fonts/icon.ttf?52m981')
format('truetype'),
url('../../assets/fonts/icon.svg?52m981#icon')
format('svg');
font-weight: normal;
font-style: normal;
}
</style>
<link rel="stylesheet" href="../../assets/stylesheets/application-a422ff04cc.css">
<link rel="stylesheet" href="../../assets/stylesheets/palettes-05ab2406df.css">
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Colfax Regular:400,700|Roboto+Mono">
<style>
body, input {
font-family: 'Colfax Regular', Helvetica, Arial, sans-serif;
}
pre, code {
font-family: 'Roboto Mono', 'Courier New', 'Courier', monospace;
}
</style>
<link rel="stylesheet" href="../../assets/css/extra.css">
<!-- Scripts -->
<script src="../../assets/javascripts/modernizr-4ab42b99fd.js"></script>
</head>
<body class="palette-primary-green palette-accent-blue-grey">
<div class="backdrop">
<div class="backdrop-paper"></div>
</div>
<input class="toggle" type="checkbox" id="toggle-drawer">
<input class="toggle" type="checkbox" id="toggle-search">
<label class="toggle-button overlay" for="toggle-drawer"></label>
<header class="header">
<nav aria-label="Header">
<div class="bar default">
<div class="button button-menu" role="button" aria-label="Menu">
<label class="toggle-button icon icon-menu" for="toggle-drawer">
<span></span>
</label>
</div>
<div class="stretch">
<div class="mainlogo">
<a href="/" title="Go to homepage.">
<img src="../../assets/img/logo.png" title="PagerDuty" />
</a>
</div>
<div class="title">
<span class="path">
Incident Response
<i class="icon icon-link"></i>
</span>
<span class="path">
After an Incident <i class="icon icon-link"></i>
</span>
Post-Mortem Template
</div>
</div>
<div class="button button-twitter" role="button" aria-label="Twitter">
<a href="https://twitter.com/spearhead_sys" title="@spearhead_sys on Twitter" target="_blank" class="toggle-button icon icon-twitter"></a>
</div>
<div class="button button-github" role="button" aria-label="GitHub">
<a href="https://github.com/spearheadsys" title="@spearheadsys on GitHub" target="_blank" class="toggle-button icon icon-github"></a>
</div>
<div class="button button-search" role="button" aria-label="Search">
<label class="toggle-button icon icon-search" title="Search" for="toggle-search"></label>
</div>
</div>
<div class="bar search">
<div class="button button-close" role="button" aria-label="Close">
<label class="toggle-button icon icon-back" for="toggle-search"></label>
</div>
<div class="stretch">
<div class="field">
<input class="query" type="text" placeholder="Search" autocapitalize="off" autocorrect="off" autocomplete="off" spellcheck="false">
</div>
</div>
<div class="button button-reset" role="button" aria-label="Search">
<button class="toggle-button icon icon-close" id="reset-search"></button>
</div>
</div>
</nav>
</header>
<main class="main">
<div class="drawer">
<nav aria-label="Navigation">
<a href="https://github.com/spearheadsys/issue-response-docs" class="project">
<div class="banner">
<div class="logo">
<img src="../../assets/img/icon.png">
</div>
<div class="name">
<strong>
Spearhead Systems Incident Response Documentation
<span class="version">
</span>
</strong>
<br>
spearheadsys/issue-response-docs
</div>
</div>
</a>
<div class="scrollable">
<div class="wrapper">
<ul class="repo">
<li class="repo-download">
<a href="https://github.com/spearheadsys/issue-response-docs/archive/master.zip" target="_blank" title="Download" data-action="download">
<i class="icon icon-download"></i> Download
</a>
</li>
<li class="repo-stars">
<a href="https://github.com/spearheadsys/issue-response-docs/stargazers" target="_blank" title="Stargazers" data-action="star">
<i class="icon icon-star"></i> Stars
<span class="count">&ndash;</span>
</a>
</li>
</ul>
<hr/>
<div class="toc">
<ul>
<li>
<a class="" title="Home" href="../..">
Home
</a>
</li>
<li>
<span class="section">On-Call</span>
<ul>
<li>
<a class="" title="Being On-Call" href="../../oncall/being_oncall/">
Being On-Call
</a>
</li>
<li>
<a class="" title="Alerting Principles" href="../../oncall/alerting_principles/">
Alerting Principles
</a>
</li>
</ul>
</li>
<li>
<span class="section">Before an Incident</span>
<ul>
<li>
<a class="" title="Severity Levels" href="../../before/severity_levels/">
Severity Levels
</a>
</li>
<li>
<a class="" title="Different Roles" href="../../before/different_roles/">
Different Roles
</a>
</li>
<li>
<a class="" title="Call Etiquette" href="../../before/call_etiquette/">
Call Etiquette
</a>
</li>
</ul>
</li>
<li>
<span class="section">During an Incident</span>
<ul>
<li>
<a class="" title="During An Incident" href="../../during/during_an_incident/">
During An Incident
</a>
</li>
<li>
<a class="" title="Security Incident" href="../../during/security_incident_response/">
Security Incident
</a>
</li>
</ul>
</li>
<li>
<span class="section">After an Incident</span>
<ul>
<li>
<a class="" title="Post-Mortem Process" href="../post_mortem_process/">
Post-Mortem Process
</a>
</li>
<li>
<a class="current" title="Post-Mortem Template" href="./">
Post-Mortem Template
</a>
<ul>
<li class="anchor">
<a title="Overview" href="#overview">
Overview
</a>
</li>
<li class="anchor">
<a title="What Happened" href="#what-happened">
What Happened
</a>
</li>
<li class="anchor">
<a title="Root Cause" href="#root-cause">
Root Cause
</a>
</li>
<li class="anchor">
<a title="Resolution" href="#resolution">
Resolution
</a>
</li>
<li class="anchor">
<a title="Impact" href="#impact">
Impact
</a>
</li>
<li class="anchor">
<a title="Responders" href="#responders">
Responders
</a>
</li>
<li class="anchor">
<a title="Timeline" href="#timeline">
Timeline
</a>
</li>
<li class="anchor">
<a title="How'd We Do?" href="#howd-we-do">
How'd We Do?
</a>
</li>
<li class="anchor">
<a title="Action Items" href="#action-items">
Action Items
</a>
</li>
<li class="anchor">
<a title="Messaging" href="#messaging">
Messaging
</a>
</li>
</ul>
</li>
</ul>
</li>
<li>
<span class="section">Training</span>
<ul>
<li>
<a class="" title="Overview" href="../../training/overview/">
Overview
</a>
</li>
<li>
<a class="" title="Incident Commander" href="../../training/incident_commander/">
Incident Commander
</a>
</li>
<li>
<a class="" title="Deputy" href="../../training/deputy/">
Deputy
</a>
</li>
<li>
<a class="" title="Scribe" href="../../training/scribe/">
Scribe
</a>
</li>
<li>
<a class="" title="Subject Matter Expert" href="../../training/subject_matter_expert/">
Subject Matter Expert
</a>
</li>
<li>
<a class="" title="Glossary" href="../../training/glossary/">
Glossary
</a>
</li>
</ul>
</li>
<li>
<a class="" title="About" href="../../about/">
About
</a>
</li>
</ul>
</div>
</div>
</div>
</nav>
</div>
<article class="article">
<div class="wrapper">
<h1>Post-Mortem Template</h1>
<p>This is a standard template we use for post-mortems at PagerDuty. Each section describes the type of information you will want to put in that section.</p>
<hr />
<div class="admonition note">
<p class="admonition-title">Guidelines</p>
<p>This page is intended to be reviewed during a post-mortem meeting that should be scheduled within 5 business days of any event.
Your first step should be to schedule the post-mortem meeting in the shared calendar for within 5 business days after the incident.
Don't wait until you've filled in the info to schedule the meeting, however make sure the page is completed by the meeting.</p>
</div>
<p><strong> Post-Mortem Owner:</strong> <em>Your name goes here.</em></p>
<p><strong> Meeting Scheduled For:</strong> <em>Schedule the meeting on the "Incident Post-Mortem Meetings" shared calendar, for within 5 business days after the incident. Put the date/time here.</em></p>
<p><strong> Call Recording:</strong> <em>Link to the incident call recording.</em></p>
<h2 id="overview">Overview<a class="headerlink" href="#overview" title="Permanent link">#</a></h2>
<p><em>Include a <strong>short</strong> sentence or two summarizing the root cause, timeline summary, and the impact. E.g. "On the morning of August 99th, we suffered a 1 minute SEV-1 due to a runaway process on our primary database machine. This slowness caused roughly 0.024% of alerts that had begun during this time to be delivered out of SLA."</em></p>
<h2 id="what-happened">What Happened<a class="headerlink" href="#what-happened" title="Permanent link">#</a></h2>
<p><em>Include a short description of what happened.</em></p>
<h2 id="root-cause">Root Cause<a class="headerlink" href="#root-cause" title="Permanent link">#</a></h2>
<p><em>Include a description of the root cause. If there were any actions taken that exacerbated the issue, also include them here with the intention of learning from any mistakes made during the resolution process.</em></p>
<h2 id="resolution">Resolution<a class="headerlink" href="#resolution" title="Permanent link">#</a></h2>
<p><em>Include a description what solved the problem. If there was a temporary fix in place, describe that along with the long-term solution.</em></p>
<h2 id="impact">Impact<a class="headerlink" href="#impact" title="Permanent link">#</a></h2>
<p><em>Be very specific here, include exact numbers.</em></p>
<table>
<thead>
<tr>
<th>Time in SEV-1</th>
<th>?mins</th>
</tr>
</thead>
<tbody>
<tr>
<td>Notifications Delivered out of SLA</td>
<td>??% (?? of ??)</td>
</tr>
<tr>
<td>Events Dropped / Not Accepted</td>
<td>??% (?? of ??) <em>Should usually be 0, but always check</em></td>
</tr>
<tr>
<td>Accounts Affected</td>
<td>??</td>
</tr>
<tr>
<td>Users Affected</td>
<td>??</td>
</tr>
<tr>
<td>Support Requests Raised</td>
<td>?? <em>Include any relevant links to tickets</em></td>
</tr>
</tbody>
</table>
<h2 id="responders">Responders<a class="headerlink" href="#responders" title="Permanent link">#</a></h2>
<ul>
<li><em>Who was the IC?</em></li>
<li><em>Who was the scribe?</em></li>
<li><em>Who else was involved?</em></li>
<li><em>Who else was involved?</em></li>
</ul>
<h2 id="timeline">Timeline<a class="headerlink" href="#timeline" title="Permanent link">#</a></h2>
<p><em>Some important times to include: (1) time the root cause began, (2) time of the page, (3) time that the status page was updated (i.e. when the incident became public), (4) time of any significant actions, (5) time the SEV-2/1 ended, (6) links to tools/logs that show how the timestamp was arrived at.</em></p>
<table>
<thead>
<tr>
<th>Time (UTC)</th>
<th>Event</th>
<th>Data Link</th>
</tr>
</thead>
<tbody></tbody>
</table>
<h2 id="howd-we-do">How'd We Do?<a class="headerlink" href="#howd-we-do" title="Permanent link">#</a></h2>
<h3 id="what-went-well">What Went Well?<a class="headerlink" href="#what-went-well" title="Permanent link">#</a></h3>
<ul>
<li><em>List anything you did well and want to call out. It's OK to not list anything.</em></li>
</ul>
<h3 id="what-didnt-go-so-well">What Didn't Go So Well?<a class="headerlink" href="#what-didnt-go-so-well" title="Permanent link">#</a></h3>
<ul>
<li><em>List anything you think we didn't do very well. The intent is that we should follow up on all points here to improve our processes.</em></li>
</ul>
<h2 id="action-items">Action Items<a class="headerlink" href="#action-items" title="Permanent link">#</a></h2>
<p><em>Each action item should be in the form of a JIRA ticket, and each ticket should have the same set of two tags: “sev1_YYYYMMDD” (such as sev1_20150911) and simply “sev1”. Include action items such as: (1) any fixes required to prevent the root cause in the future, (2) any preparedness tasks that could help mitigate the problem if it came up again, (3) remaining post-mortem steps, such as the internal email, as well as the status-page public post, (4) any improvements to our incident response process.</em></p>
<h2 id="messaging">Messaging<a class="headerlink" href="#messaging" title="Permanent link">#</a></h2>
<h3 id="internal-email">Internal Email<a class="headerlink" href="#internal-email" title="Permanent link">#</a></h3>
<p><em>This is a follow-up for employees. It should be sent out right after the post-mortem meeting is over. It only needs a short paragraph summarizing the incident and a link to this wiki page.</em></p>
<blockquote>
<p>Briefly summarize what happened and where the post-mortem page (this page) can be found.</p>
</blockquote>
<h3 id="external-message">External Message<a class="headerlink" href="#external-message" title="Permanent link">#</a></h3>
<p><em>This is what will be included on the status.pagerduty.com website regarding this incident. What are we telling customers, including an apology? (The apology should be genuine, not rote.)</em></p>
<blockquote>
<p>Summary</p>
<p>What Happened?</p>
<p>What Are We Doing About This?</p>
</blockquote>
<aside class="copyright" role="note">
Copyright &copy; Spearhead Systems, Inc. &ndash;
Documentation built with
<a href="http://www.mkdocs.org" target="_blank">MkDocs</a>
using the
<a href="http://squidfunk.github.io/mkdocs-material/" target="_blank">
Material
</a>
theme.
</aside>
<footer class="footer">
<nav class="pagination" aria-label="Footer">
<div class="previous">
<a href="../post_mortem_process/" title="Post-Mortem Process">
<span class="direction">
Previous
</span>
<div class="page">
<div class="button button-previous" role="button" aria-label="Previous">
<i class="icon icon-back"></i>
</div>
<div class="stretch">
<div class="title">
Post-Mortem Process
</div>
</div>
</div>
</a>
</div>
<div class="next">
<a href="../../training/overview/" title="Overview">
<span class="direction">
Next
</span>
<div class="page">
<div class="stretch">
<div class="title">
Overview
</div>
</div>
<div class="button button-next" role="button" aria-label="Next">
<i class="icon icon-forward"></i>
</div>
</div>
</a>
</div>
</nav>
</footer>
</div>
</article>
<div class="results" role="status" aria-live="polite">
<div class="scrollable">
<div class="wrapper">
<div class="meta"></div>
<div class="list"></div>
</div>
</div>
</div>
</main>
<script>
var base_url = '../..';
var repo_id = 'spearheadsys/issue-response-docs';
</script>
<script src="../../assets/javascripts/application-997097ee0c.js"></script>
</body>
</html>