sph specifics

This commit is contained in:
Marius Pana 2017-01-21 15:16:34 +02:00
parent 306ffc4a94
commit ff15a17843
1 changed files with 57 additions and 0 deletions

57
docs/training/sysadmin.md Normal file
View File

@ -0,0 +1,57 @@
So you want to be a Sysadmin? You've come to the right place!
![Deputy](../assets/img/headers/incident_command_support.jpg)
*Credit: [oregondot @ Flickr](https://www.flickr.com/photos/oregondot/8743801731/in/album-72157633494644719/)*
## Purpose
The purpose of the Sysadmin is to support the TL by keeping track of timers, notifying the TL of important information, and paging other people as directed by the TL.
It's important for the TL to focus on the problem at hand, rather than worrying about monitoring timers. The deputy is there to help support the TL and keep them focussed on the incident.
As a Sysadmin, you will be expected to take over command from the TL if they request it.
**You should be performing any remediations, checking graphs, or investigating logs** unless otherwise delegated by the TL.
## Prerequisites
Before you can be a Sysadmin, it is expected that you meet the following criteria. Don't worry if you don't meet them all yet, you can still continue with training!
* Be trained as an [Team Leader](/training/team_leader.md).
## Responsibilities
Read up on our [Different Roles for Incidents](/before/different_roles.md) to see what is expected from a Sysadmin, as well as what we expect from the other roles you'll be interacting with.
## Training Process
The training process for a Sysadmin is quite simple.
* Follow our [Team Leader Training](/training/team_leader.md).
* Read this page.
## Incident Call Procedures and Lingo
The [Steps for Sysadmin](/during/during_an_incident.md) provide a detailed description of what you should be doing during an incident.
Here are some examples of phrases and patterns you should use during incident calls.
### Keep Track of Responders
As you listen to the call, you should keep track of the responders to the call as you hear them speak. Make a note on a piece of paper and add them to the Watchers in DoIT. The TL may ask you who is on-call for a particular issue, and you should know the answer, and be able to page them.
> Do we have a representative from [X] on the call?
> (pause)
> Sysadmin, can you go ahead and page the [X] on-call please.
You can page them however you see fit, phone call, etc.
### Provide Executive Status Updates
Provide regular status updates on Slack (roughly every 30mins), giving an executive summary of the current status during IN-3 incidents. Keep it short and to the point, and use @here. Mention the current state, the actions in progress, customer impact, and expected time remaining. It's OK to miss out some of those if the information isn't known.
> @here: We are in IN-3 due to X. Current actions in progress are to do Y. Expecting 3 mins to complete that action. Once action is complete, system should recover on its own within 5 minutes.
### Alert TL to Timers
You are expected to keep track of how long the incident has been running for, and provide callouts to the TL every 10 minutes so they can take actions such as increasing the severity, or asking Support to Tweet out. This is as simple as telling the TL on the call,
> TL, be advised the incident is now at the 10 minute mark.
Similarly, when the TL asks for someone to get back to them in X minutes, you are expected to keep track of that. You should remind the TL when that time has been reached.
> TL, be advised the timer for [TEAM]'s investigation is up.