Ultimate lesson learned

Incident overview

  1. Impact summary

  2. Outage start time / Outage end time

  3. Timeline of the issue

  4. Root cause

    1. Software component:

    2. Team:

  5. Contact person

  6. Manager of contact person

  7. Detection

Incident response

Could the change be rolled back easily?

Monitoring

Was issue automatically detected?

Prevention and Improvement

What to do in order to prevent the issue?

Others

How many issues your team had per quarter?

How you measure impact of the issue?

Last updated

Was this helpful?