spearhead-issue-response/docs/training/scribe.md

4.8 KiB

So you want to be a scribe? You've come to the right place! You don't need to be a senior team member to become a deputy or scribe, anyone can do it providing you have the requisite knowledge!

Typewriter Credit: Holly Chaffin

Purpose

The purpose of the Scribe is to maintain a timeline of key events during an incident. Documenting actions, and keeping track of any followup items that will need to be addressed.

It's important for the rest of the command staff to be able to focus on the problem at hand, rather than worrying about documenting the steps.

Your job as Scribe is to listen to the call and to watch the incident Slack room and DoIT card(s), keeping track of context and actions that need to be performed, documenting these as you go. You should not be performing any remediations, checking graphs, or investigating logs. Those tasks will be delegated to the subject matter experts (SME's) by the Team Leader.

Prerequisites

Before you can be a Scribe, it is expected that you meet the following criteria. Don't worry if you don't meet them all yet, you can still continue with training!

  • Excellent verbal and written communication skills.
  • Has knowledge of obscure PagerDuty terms.

Responsibilities

Read up on our Different Roles for Incidents to see what is expected from a Scribe, as well as what we expect from the other roles you'll be interacting with.

Training Process

There is no formal training process for this role, reading this page should be sufficient for most tasks. Here's a list of things you can do to train though,

  • Read the rest of this page, particularly the sections below.

  • Participate in Friday DoD (DoD).

    • Shadow a DoD to see how it's run.
    • Be the scribe for multiple DoD's.

Scribing

Scribing is more art than science. The objective is to keep an accurate record of important events that occurred on the call, so that we can look back at the timeline to see what happened. But what exactly is important? There's no overwhelming answer, and it really comes down the judgement and experience. But here are some general things you most definitely want to capture as scribe.

  • The result of any polling decisions.
    • This is not "9 people voted yay, 3 voted nay".
    • It is "Polled for if we should do rolling restart. <USER_A> is proceeding with restart."
  • Any followup items that are called out as "We should do this..", "Why didn't this?..", etc.
    • This is not "Why isn't the Support representative on the call?"
    • This is "TODO: Why didn't we get paged for this earlier?"

Incident Call Procedures and Lingo

The Steps for Scribe provide a detailed description of what you should be doing during an incident.

Here are some examples of phrases and patterns you should use during incident calls.

Status Stalking

At the start of any major incident call, you should start our status stalking bot, so that it will post to the room an update automatically.

!status stalk

This will provide the update and allow the TL to see the status without having to keep asking.

Note Important Actions

During a call, you will hear lots of discussion happening, you should not be documenting all of this in the chat room. You only want to document things which will be important for the final timeline. It's not always obvious what this might be, and it's usually a matter of judgement. You generally want to note any actions the TL has asked someone to perform, along with the result of any polling decisions.

Polled for decision on whether to perform rolling restart. We are proceeding with restart. [USER_A] to execute.

Some actions might seem important at the time, but end up not being. That's OK. It's better to have more info than not enough, but don't go overboard.

Note Followup Actions

Sometimes during the call, someone will either mention something we "should fix", or the TL will specifically ask you to note a followup item. You can do this in Slack and DoIT by simply prefixing with "TODO", this will make it easier to search for later.

TODO: Why did we not get paged for the fall in traffic on [X] cluster?

The post-mortem owner will find these after and raise tasks for them.

End of Call Notification

When the TL ends the call, you should post a message into Slack to let everyone know the call is over (and notify customers directly via their preffer communications channel), and that they should continue discussion elsewhere.

Call is over, thanks everyone. Follow up in Slack.