By KIT MERKER, Microsoft
During a crisis, there is almost by definition a shortage of accessible information. Because of the time pressure a disaster creates, anything considered noise gets filtered out and ignored. However, if you could create a plan to track the right information and make it available during difficult times, it could mean the difference between tragedy and a close call.
Creating and maintaining information repositories that can be used in a crisis is not necessarily cheap or easy, and you have to make a call about the return on investment (ROI) of this type of tool. If you're responsible for disaster and continuity planning, you will need to make the case for why the organization should take on the burden of additional systems. This should be easier in retrospect.
Consider the following questions and how this might help you be more effective in your next crisis.
How Did We Handle this Last Time?
If you've encountered the same or a similar problem before, it would be great to know what you did and who did it. Understanding how long it took might also help you set expectations with customers. Taking the extra time to record a few key pieces of information whenever you solve a production problem will help ensure that you can resolve similar issues in the future with speed and ease.
Whom Else Does this Affect?
Understanding the scope of impact of a particular symptom is a key to determining root cause. If your website is suddenly unreachable, you need to know if it is only related to you or to some portion of the Internet in general. Sometimes you can determine this with a simple test: Go to a high-availability website like www.yahoo.com. Other systems will likely need something more sophisticated. Ideally you can build this into your health and monitoring systems so your alerts can be intelligent.
What Are People Saying about Us?
Listening to what customers are saying is critical during a crisis. Ideally, you will know about problems before they do, but that might not be the case depending on what the issue is and where you are in the crisis lifecycle. Creating a place for people to raise issues easily is a great way to learn about issues early and respond to them quickly.
You can also set up alerts for news feeds, blog posts and Twitter terms, so you can see if people are talking about you, which might indicate that something is out of the ordinary.
There are a variety of complaint sites out there (RipoffReport.com and Complaints.com, for example) where people can go and talk about how terrible you are. Monitoring these should give you insights if you have systemic service issues, some of which may be caused by systems instability or disaster response.
And, of course, never just delete negative comments!
What Should I Tell People?
When you suddenly realize that you caused a problem for your customers and you're not sure how to fix it, this is not the time to figure out how to tell them. At the same time, you can't predict what problems will be encountered and create prepared messages for all of them ahead of time.
Communication policies will guide your team to a clear message. Set some principles and goals about the tone, level of detail, channels and frequency of updates, both for internal and external audiences. Providing customers and employees with clear updates is key to gaining respect and a positive reputation during a crisis.
A simple way to think about this communication is “regret, reason, remedy.” For a great example of this, look at how wordpress.com handles its downtime. There are other great examples and ideas on mashable.
What Data Have We Lost?
Figuring out what data has either been disclosed or is unrecoverable is very tricky. In the case of a disclosure attack, it may be impossible to ever learn what was lost. In that case, you have to take a pessimistic view.
How Can You Get this Information?
You have two key types of information that you need to track in order to be ready for a crisis: internal and external. Internal information tells you what you've done before, who can help with what kinds of problems, what options and rules exist internally for a crisis, and what is happening right now.
External data has to be collected from the outside world and made sense of in order to be useful. You can create direct channels of information: user communities, blogs, forums or support tickets. You can also look at indirect information: Twitter topics and hash tags, third-party review sites, blog posts and articles.
There are a variety of systems you could use to store your info, but the important step is to get it into a well-known location, filtered to only the most critical and valuable information, and organized logically. Once you start collecting information and making it available, you risk having so much that it's unwieldy.
It's hard to have the right information to make fast and accurate decisions in the heat of a crisis. But if you build the right systems, you may be able to get better early warnings and responses in place to turn down the heat on your next disaster.
What information has helped you during a crisis? Have you ever collected information that proved to be too noisy to be useful? What else is missing from the list? Add your ideas using the comments section below.
Merker shares his obsession with preventing and preparing for software disasters on www.softwaredisastersblog.com. He's worked in software for over a decade and currently works as an Evangelist at Microsoft, helping communications and media software companies embrace cloud and mobile device technology. For more information, please visit www.microsoft.com.