IndexServalSource

Operations

Site Reliability Engineering at Google – Christof Leng

The rule:

Fixes:

  1. Common Staffing Pool: one more SRE = one less developer
  2. SRE hires only coders
  3. 50% cap on Ops work (toil)
  4. Keep DEV in the rotation
  5. Excess operations load (tickets, oncall, etc) always gets assigned to the dev team
  6. SRE Portability and the nuclear option