A Tale of Postmortems
(The following is a copy-paste of a blog published August 10th, 2014 on the www.box.com blog, which has been permanently archived) A Tale of Postmortems Site issues are a part of life for most web application shops. Database errors, buggy code, vendor failures, growing pains, etc. rear their heads and keep engineers up at night. At Box, we're no exception, and over the years we've done our fair share of triaging and solving site issues. This is a story about the evolution of site outages at Box, a grassroots campaign to scorch our tech debt, mold our postmortems, and ultimately reclaim operational confidence. In The Beginning After years in business, our tech debt had added up, and we were paying the price. We knew it was bad, but we weren't measuring the right things and consequently didn't have a clear picture of the damage. We realized that the logical first step was to keep track of site outages, so we created a new JIRA project and workflow. Looking back, this seem...
