Dig deep enough on the corporate website for many software-based companies and you’ll find seemingly innocuous little documents called “post-mortems,” in which some technically-oriented employee explains in detail the nature of a recent downtime or failure. These explanations serve various purposes, depending on who’s reading — some are simply interested in the product, and want to be aware of its status; others want to know why something they depend on wasn’t working, and whether it will continue working reliably in the future. A third, and most critical group or readers, consists of other technical people who take from the post-mortem a sliver of knowledge to bring to their own operations.
A post-mortem could elaborate on a method of thinking about software development, or a very nuanced flaw in a particular piece of software — a “bug,” as the common phrase goes. They’re generally made in the interest of disclosing information rather than burying it away for fear of bad press, and they’re generally beneficial to the community. It spurs interest at the very least, and informs others in the best of situations.
For example, Dropbox — the fairly popular service for synchronizing files across computers and sharing them with others — just underwent a 48-hour downtime/slowdown, and then released a post-mortem explaining what went wrong, how they fixed it and even teased the future release of a tool that would help others from getting into a similar bind.
Manufacturing has no real equivalent, and I wonder if that isn’t a problem, or, at least, an opportunity. The industry is tight-lipped about internal operations, and most plant managers would be more apt to thinking that a major production disruption would be something to silenced as much as possible, told only to a few key business partners and perhaps the vendor of the malfunctioning equipment. They certainly wouldn’t get right on writing up a blog post about it. But what if they did? What if they thought others could find value in what went wrong?
Some will be eager to say, “The malfunction of one specific system wouldn’t be of use to anyone else — everything is too specialized.” Others will ask, “What about intellectual property?” And even more will argue that it’s a useless exercise, and that managers and engineers would be better off spending their time making sure the problem doesn’t happen again, or work to prevent some impending-but-unidentified fault. And finally: “Why should I help my competition?”
Those are all valid points, and no one should be penning a post-mortem until the problem is fixed and safeguards are in place to prevent a repeat offender. But they alone aren’t good enough reasons to dismiss the post-mortem completely. Writing a lengthy piece about a particular malfunction might not have a direct connection to another company’s operations — it’s true that even common pieces of equipment are configured in a million different ways — but it could offer some new way of thinking that a reader might not have thought about. They could then turn around and implement a new preventative maintenance program, or monitor a process in a new way to get better visibility.
The software world isn’t all that different, anyway — most services are cobbled together with common frameworks, which means that they’re all exposed to similar faults, even if each uses that base in a slightly different way.
And it’s easy enough to scrub intellectual property from a post-mortem. It’s no secret that an automaker uses stamping machines or that a gearmaker uses CNC centers to machine each part — we can stop pretending that every admission of the process is also offering some of the “secret sauce.”
This means that there is some other, and bigger, issue that is keeping manufacturers from being more open about downtime. I think it comes down to a kind of shame, or the feeling that admitting to downtime equates to the loss of reputation. But that shouldn’t necessarily be — why can’t it be a teaching moment, not only the company at hand, but all others who care about its existence?
In the software world, an in-depth post-mortem can actually increase confidence in a business and encourage more trust. Everyone and every business makes mistakes, and often, what separates a successful one from a failed one is the way in which they respond to the unexpected. Proving that your company can do just that should be a matter of pride, not shame. Downtime happens to everyone, but recoveries are what differentiate the most successful.
Oftentimes, manufacturing feels far too siloed, every company struggling in its own four walls. There’s a lot that could be learned if companies began to not only fix downtime, but take advantage of its instructional moments, and then share those with others. I think we can all agree — direct competitors, in similar industries or even on opposite sides of the spectrum — that it’s in everyone’s best interest that companies stay online and ready to produce. The industry certainly can use every positive movement it can get.
One company that is doing this well is Tesla. In the aftermath of the spate of vehicles that struck metal road debris and caught fire, the company posted numerous articles on its corporate site explaining what had happened, what the vehicle’s design had already done well (preventing the fire from entering the passenger cabin) and what they were planning on doing to mitigate the same from happening again — strengthening the battery compartment even more and raising the suspension at highway speeds.
I’d wager for many who read about the incident, and the subsequent letters, their level of confidence in Tesla only increased. I have no doubt the company converted a few new potential buyers with each new post. And while they haven’t talked much, or at all, about the processes that lead to these flaws, they’re already miles ahead of their competitors, and most manufacturers as a whole, and that’s gaining them not only marketshare, but also public perception and trust.
Perhaps the rest of the industry will learn a few things from these post-mortems. Perhaps they will learn about how to better talk with customers and business partners, and perhaps they will learn about how to address problems in a nimble, poised fashion.
But each would be better off doing their own instead. I think the whole industry should think of Tesla’s efforts as a threat, much like that mission-critical machine that’s on the verge of breaking down — the best way to deal with a threat is to address it head-on, and to work far ahead, and to keep working until the problem is solved.
And then write about it.