Sysadmin
So you want to be a Sysadmin? You've come to the right place!
Credit: oregondot @ Flickr
Purpose#
The purpose of the Sysadmin is to support the TL by keeping track of timers, notifying the TL of important information, and paging other people as directed by the TL.
It's important for the TL to focus on the problem at hand, rather than worrying about monitoring timers. The deputy is there to help support the TL and keep them focussed on the incident.
As a Sysadmin, you will be expected to take over command from the TL if they request it.
You should be performing any remediations, checking graphs, or investigating logs unless otherwise delegated by the TL.
Prerequisites#
Before you can be a Sysadmin, it is expected that you meet the following criteria. Don't worry if you don't meet them all yet, you can still continue with training!
- Be trained as an Team Leader.
Responsibilities#
Read up on our Different Roles for Incidents to see what is expected from a Sysadmin, as well as what we expect from the other roles you'll be interacting with.
Training Process#
The training process for a Sysadmin is quite simple.
- Follow our Team Leader Training.
- Read this page.
Incident Call Procedures and Lingo#
The Steps for Sysadmin provide a detailed description of what you should be doing during an incident.
Here are some examples of phrases and patterns you should use during incident calls.
Keep Track of Responders#
As you listen to the call, you should keep track of the responders to the call as you hear them speak. Make a note on a piece of paper and add them to the Watchers in DoIT. The TL may ask you who is on-call for a particular issue, and you should know the answer, and be able to page them.
Do we have a representative from [X] on the call?
(pause)
Sysadmin, can you go ahead and page the [X] on-call please.
You can page them however you see fit, phone call, etc.
Provide Executive Status Updates#
Provide regular status updates on Slack (roughly every 30mins), giving an executive summary of the current status during IN-3 incidents. Keep it short and to the point, and use @here. Mention the current state, the actions in progress, customer impact, and expected time remaining. It's OK to miss out some of those if the information isn't known.
@here: We are in IN-3 due to X. Current actions in progress are to do Y. Expecting 3 mins to complete that action. Once action is complete, system should recover on its own within 5 minutes.
Alert TL to Timers#
You are expected to keep track of how long the incident has been running for, and provide callouts to the TL every 10 minutes so they can take actions such as increasing the severity, or asking Support to Tweet out. This is as simple as telling the TL on the call,
TL, be advised the incident is now at the 10 minute mark.
Similarly, when the TL asks for someone to get back to them in X minutes, you are expected to keep track of that. You should remind the TL when that time has been reached.
TL, be advised the timer for [TEAM]'s investigation is up.