updated different roles; restructured for SPH structure
This commit is contained in:
parent
9afc5a5168
commit
63474d1235
@ -1,8 +1,9 @@
|
||||
There are several roles for our incident response teams at Spearhead Systems. Certain roles only have one person per incident (e.g. support engineer), whereas other roles can have multiple people (e.g. Sysadmins, Solution Architects, etc.). It's all about coming together as a team, working the problem, and getting a solution quickly.
|
||||
Our support services are deliviered via a flat organizational structure. The same people that deliver projects are also there to deliver ongoing support/maintenance services.
|
||||
There are several roles in our support team at Spearhead Systems. Certain roles only have one person per incident (e.g. sysadmin), whereas other roles can have multiple people (e.g. Sysadmins, Solution Architects, etc.). It's all about coming together as a team, working the problem, and getting a solution quickly.
|
||||
|
||||
Here is a rough outline of our role hierarchy, with each role discussed in more detail on the rest of this page.
|
||||
|
||||
![Incident Response Structure](../assets/img/misc/incident_response_roles.png)
|
||||
![Incident Response Structure](../assets/img/misc/incident_roles.png)
|
||||
|
||||
---
|
||||
|
||||
@ -12,7 +13,7 @@ Here is a rough outline of our role hierarchy, with each role discussed in more
|
||||
A Team Leader acts as the single source of truth of what is currently happening and what is going to happen during an major incident. They come in all shapes, sizes, and colors. TL's are also the key elements in a project (boards in DoIT).
|
||||
|
||||
### Why have one?
|
||||
As any system grows in size and complexity, things break and cause incidents. The TL is needed to help drive major incidents to resolution by organizing his team towards a common goal.
|
||||
As any system grows in size and complexity, things break and cause incidents. The TL is needed to help drive major incidents to resolution by organizing his team towards a common goal. A TL's skillset includes project and resource management skills which are essential in driving both projects and incidents to a smooth resolution.
|
||||
|
||||
### What are the responsibilities?
|
||||
1. Help prepare for projects and incidents,
|
||||
@ -65,33 +66,29 @@ Any Team Leader can act as a Sysadmin. Sysadmins need to be trained as an Team L
|
||||
Take a look at our [Sysadmin training guide](/training/deputy.md). Sysadmins also need to be [trained as an Team Leaders](/training/incident_commander.md).
|
||||
|
||||
---
|
||||
|
||||
TODO:::move scribe responsibilities to TL and Sysadmin
|
||||
::: or assign this to our juniors?
|
||||
## Scribe
|
||||
|
||||
### What is it?
|
||||
A Scribe documents the timeline of an incident as it progresses, and makes sure all important decisions and data are captured for later review.
|
||||
A Scribe documents the timeline of an incident as it progresses, and makes sure all important decisions and data are captured for later review. We will not have a dedicated Scibe in all situations therefore a junior will take on this role. This is an essential role as all Juniors are expectd to grow into other areas and take on more responsibilities as they evolve.
|
||||
|
||||
|
||||
### Why have one?
|
||||
The incident commander will need to focus on the problem at hand, and the subject matter experts will need to focus on resolving the incident. It is important to capture a timeline of events as they happen so that they can be reviewed during the post-mortem to determine how well we performed, and so we can accurate determine any additional impact that we might not have noticed at the time.
|
||||
The Team Leader will need to focus on the problem at hand, and the sysadmins and subject matter experts will need to focus on resolving the incident. It is important to capture a timeline of events as they happen so that they can be reviewed during the post-mortem to determine how well we performed, and so we can accurate determine any additional impact that we might not have noticed at the time.
|
||||
|
||||
### What are the responsibilities?
|
||||
The Scribe is expected to:
|
||||
|
||||
1. Ensure the incident call is being recorded.
|
||||
1. Note in Slack important data, events, and actions, as they happen. Specifically:
|
||||
1. Note in DoIT, Slack, etc. important data, events, and actions, as they happen. Specifically:
|
||||
* Key actions as they are taken (Example: "prod-server-387723 is being restarted to attempt to remove the stuck lock")
|
||||
* Status reports when one is provided by the IC (Example: "We are in SEV-1, service A is currently not processing events due to a stuck lock, X is restarting the app stack, next checkin in 3 minutes")
|
||||
* Status reports when one is provided by the TL (Example: "We are in IN-Major, service A is currently not processing events due to a stuck lock, X is restarting the app stack, next checkin in 3 minutes")
|
||||
* Any key callouts either during the call or at the ending review (Example: "Note: (Bob B) We should have a better way to determine stuck locks.")
|
||||
|
||||
### Who are they?
|
||||
Anyone can act as a scribe during an incident, and are chosen by the Incident Commander at the start of the call. Typically the Deputy will act as the Scribe, but that doesn't necessarily need to happen, and for larger incidents may not be possible.
|
||||
Anyone can act as a Sribe during an incident, and are chosen by the Team Leader at the start of the call. Typically the Sysadmin will act as the Scribe, but that doesn't necessarily need to happen, and for larger incidents may not be possible.
|
||||
|
||||
### How can I become one?
|
||||
Follow our [Scribe training guide](/training/scribe.md), and then notify the Incident Commanders that you would like to be considered for scribing for the next incident.
|
||||
|
||||
TODO::: END move scribe responsibilities to TL and Sysadmin
|
||||
Follow our [Scribe training guide](/training/scribe.md), and then notify the Team Leaders that you would like to be considered for scribing for the next incident.
|
||||
|
||||
---
|
||||
|
||||
@ -122,17 +119,17 @@ Take a look at our [Subject Matter Expert training guide](/training/subject_matt
|
||||
## Customer Liaison
|
||||
|
||||
### What is it?
|
||||
A person responsible for interacting with customers, either directly, or via our public communication channels. Typically a member of the Customer Support team.
|
||||
A person responsible for interacting with customers, either directly, or via our public communication channels. This is typically the TL while in some situations another member of the Support Team or even Management may intervene and relay vital information to the customer.
|
||||
|
||||
### Why have one?
|
||||
All of the other roles will be actively working on identifying the cause and resolving the issue, we need a role which is focused purely on the customer interaction side of things so that it can be done properly, with the due care and attention it needs.
|
||||
|
||||
### What are the responsibilities?
|
||||
1. Post any publicly facing messages regarding the incident (DoIT, Twitter, StatusPage, etc).
|
||||
1. Post any publicly facing messages regarding the incident (DoIT, Twitter, etc).
|
||||
1. Notify the TL of any customers reporting that they are affected by the incident.
|
||||
|
||||
### Who are they?
|
||||
Any member of the Support Team can act as a customer liaison.
|
||||
Any member of the Support Team or Management (provided user has undergone trainig) can act as a customer liaison.
|
||||
|
||||
### How can I become one?
|
||||
Discuss with the Support Team about becoming our next customer liaison.
|
||||
|
Loading…
Reference in New Issue
Block a user