Messin’ With Metrics

I’ve stated many times that metrics are merely indicators of the truth; they are not the truth. To see the truth we must get out of our chairs, put down our smart phones, and go seek it directly. Even then, though, as soon as others perceive that we are watching, their behavior will change and the truth may become somewhat disguised. Why does this happen? As community creatures, prone to pride, we like to do well.

We gage our business and process performance using metrics, yet metrics can deceive as well as they can reveal.  Build better, more robust, metrics and monitor them more wisely by understanding how to manipulate them.

The Rest of the Story:

I’ve stated many times that metrics are merely indicators of the truth; they are not the truth.  To see the truth we must get out of our chairs, put down our smart phones, and go seek it directly.  Even then, though, as soon as others perceive that we are watching, their behavior will change and the truth may become somewhat disguised.

Why does this happen?  As community creatures, prone to pride, we like to do well.  What’s more, when we believe we are doing well, we want others to also believe that we are doing well.  Therefore, we are prone to making the measurement of our performance as positive as possible, especially if there is reward or retribution at stake.

Let me offer a real example.  In my earliest entry into the world of Lean and Six Sigma, I was part of a spearhead team introducing the Lean Six Sigma methodology to a business unit within a global corporation.  When we first began reporting functional performances in terms of Defects Per Million Opportunities (DPMO) and Sigma Scores, the technical documentation function reported Six Sigma scores right out of the gate.

Knowing that a Sigma score of 6 (basically 3 defects in 1-million opportunities) is extremely difficult to achieve, the spearhead team took a look.   We observed that they had produced three or four technical documents that month.  Well, if all three or four were not defective, or defect-free, then it is easy to see how they could claim such a high score.  We let it go.

Within a couple of months however, the technical documentation group began reporting higher than 8-sigma performance scores.  Now we knew that something was fishy.  How could they possibly report less than one defect in several million opportunities when they only produce a handful of deliverables a month?

The answer turned out to be very simple and, to an amateur in Lean Six Sigma practices, perfectly reasonable.  The group had concluded that every single word was an opportunity for a defect, typically in the form of a typographical error.  When a single document might contain 300-500 pages at 400 words per page, it is easy to see where their numbers came from. 

It sounds like a simple mistake on the learning curve.  Fair enough.  The manipulation showed up when we discovered that the method of measuring did not account for the actual number of words changed.  The group might have updated a few words on several pages and ratcheted the revision code, but the metric counted the words of the entire document.  I won’t accuse the group of intentionally baking the opportunity count, but the system made it happen.

They sure looked good, but to the well informed, they looked too good to be believed.  The day that a document was rejected by a customer we had a long chat with that group about how their method of calculating the metric was not really telling them what they needed to know, or what really mattered from a customer-business perspective.

Unfortunately, for our metrics, human behavior tends to mess with them.  It does so in many, many ways.  Sometimes they stretch the truth, and sometimes they underplay the truth. 

When we understand how human behavior and metrics interact, and how we can manipulate metrics without technically breaking the rules, we can not only look for behaviors that might alter metric perceptions, we can ask smart questions to challenge what the metrics would have us perceive, and we can build better guidelines, rules, or metrics to mitigate the phenomenon.

For the sake of discussion and example, let’s look at a single, popular metric and let’s “mess” with it a little.  Let’s talk Yield.  We’ll take a look at all the ways that we can manipulate the Yield metric and turn that into some ideas for how we can protect any variety of other metrics with some simple tactics and wisdom.

Since there are many different ways of calculating Yield, for the sake of discussion, let’s keep it simple and define Yield as the ratio of inputs to out-puts, subtracting re-worked or defective outputs from the numerator.  Let us say that the Yield equation is as follows.

Y = outputs (not including reworked or defective pieces)/inputs

Of course, “inputs” would be a count of every piece that started the process.  As we said, “outputs” is a count of every piece that exited the process without being defective or reworked in order to successfully meet specifications.  Even simple metrics like this can be messed with.  Let’s see what we can do.

1. Making data collection easier:

Collecting data is a drag.  Sometimes the best data to collect is the most difficult.  Look at the technical documentation group example above.  It can be very difficult to look at a 300-page document and decide if it is defective or if it is good.  It is very easy with modern word-processors to count words and typographical errors.

Unfortunately, one could type, “My dog has fleas,” over and over for 300 pages without making any typographical errors, but the document absolutely wouldn’t satisfy the customer’s expectations.  It would still be defective.

Sometimes collecting data on every piece isn’t practical and we must resort to sampling.   Doing so can be very efficient and effective, or it can be very dangerous.  Statistical power and sample-size must be taken into account and samples must cut across noise factors such as shift, season, material batch, etc.

Automated data collection systems can be more reliable and accurate than human observations, or they can be fooled easily.  If we use automated data collection, we must examine how the system can bee fooled and then make sure that it isn’t.

To battle the need for simplified data collection causing our metrics to lose integrity, consider three basic tactics.

a) Make sure that what your system measures in terms of defects is also what the customer perceives as defects, and that your system can definitively characterize something the customer would consider defective

b) Be sure that noise factors that could contribute to defects are not missed or ignored by sampling frequencies or locations

c) Routinely inspect and audit automated data collection systems to be sure that they are correctly identifying defects and are calibrated correctly

These may sound like obvious tactics, but when was the last time you challenged any metric with a, b, or c above?  Once we start using a metric, we quickly forget to challenge them periodically.

2. Making judgments easier:

Coincident with data collection being a drag, measuring and determining if something is good or bad can also be very tedious.  The same three tactics for ensuring data is collected appropriately can apply to ensuring that measurements and judgments are taking place.

The additional complication that judgment decisions bring is the phenomenon that decisions are easiest when they use the least amount of information.  Therefore, the easiest way to decide might not be the most informed way to decide.

An obvious example of this is a challenge for most organizations utilizing Statistical Process Control (SPC) or other statistical methods such as Six Sigma.  It is much easier to establish a “go-no-go” or “pass/fail” gauge than it is to measure actual dimensions, time, or other performance elements.  Unfortunately, the binary yes/no result does very little to support statistical methods which are seeking trends or shifts in process performance in an effort to predict and prevent the event of a piece falling out of specification.

Be sure that your judgments are aligned with customer perceptions.  Also, be sure that how you determine good from bad supports your business and process practices and decision-making methods.  Finally, be sure that the system or individuals that make the decision are capable of doing so.  A technical writer might not know if the schematic diagram in the appendix is accurate, but the design engineer might.

3. Defects vs. Defectives:

This might be specifically related to Yield-type metrics, but the phenomenon can translate to others.  The problem is a definition of what to look for in the metric.  How we define our metrics can often get us into trouble.

In Yield-type metrics the chase is to identify defects that result from the process and try to minimize them.  Unfortunately, the distinction between a defect and what makes a product defective is often confused or neglected.  Let’s look again at the documentation example.

Typographical errors are one form of defect that was easily screened.  Unfortunately, a document could be free of typos and still be defective if the schematic in the appendix was inaccurate or if it was organized poorly, or if it just didn’t make sense.

Here’s another example.  The gap between the glove box lid and the dashboard of your car might be out of specified tolerance, but that probably doesn’t make your car defective.  You might not even notice.  A defect does not necessarily make your product defective.

However, sometimes a single defect can make a product defective.  A poor solder joint on the motherboard of your personal computer could certainly ruin your day. 

Alternatively, a product built completely within specification, can still be defective.  If the customer returns the product because the customer was not satisfied, the product was defective.  (Here is where I have made an assertion that has started many arguments)

Many argue that if the product is built perfectly, according to specifications, and performs correctly, it is not defective.  However, if a dissatisfied customer returns it, it has the same business impact as a product that was built out of specification and did not perform correctly.  The defect may be in the advertising assurance of the performance-value expectations for the customer, or it may be in the specification itself.  Bottom line, it hurts the business the same way as a traditionally classified defective and should be counted as such, in my opinion.

Regardless, confusing “defects” with “defectives” can really cause a great deal of trouble when using the metric to make business decisions.  An inflated perception of poor quality because of defects that may or may not mean anything to a customer can drive business decisions detrimental to a product line.  Similarly, believing that Yield is excellent, but not being aware of challenges in production quality can lead to decisions based on false confidence.

Yield should be counted, not in terms of “defects”, but in terms of “defectives.”  Since Yield is a measure of products that are “good for customers” it should be based on a judgment of the final, complete product as good or bad.  However, for a metric used to gage the performance of a process, simply looking at “defectives” without considering “defects” paints an incomplete picture.

The solution is to measure both in-process, defect, metrics like DPMO where opportunities might be associated with specific critical parameters and also measuring a count of “defective” or re-worked out-puts.  Measuring both allows us to see what the customer sees, but also address challenges within the process that we hope the customer never does.

When translating this to other metrics, consider what it is you want the metric to show you.  Try to devise metric pairs that indicate both the customer’s perspective, as well as the inner-workings of the process, of which we hope the customer never becomes aware.  For example, we hope that the gap between the glove box lid and dashboard never becomes so gross that a customer refuses to buy the car.

There may be relationships between the metric pairs where one can act as forewarning or a predictor of the other.  If so, this can be very powerful.

4. Who Cares?

The “who cares” phenomenon drives a great deal of problems for metrics.  Fundamentally, it can cause all of the mistakes or manipulations mentioned above.  If we perceive that no one is watching, then the importance of collecting data and generating metrics falls to the bottom of the priority list.

None of us wants to waste time and energy doing work that no one will appreciate.  Therefore, efforts to do data collection and metrics reporting in the best possible way fall off.

The best defense against the “who cares” phenomenon is presence, followed by communication.  If metrics and progress are communicated back to the metrics gatherers and reporters with kudos or expectations of improvement then people know that the metrics are valued.

Presence is even more powerful.  Put down the phone, get out of the chair, and go to the process owners.  Talk to them about the metrics.  Ask the following questions.

a) How do you collect or calculate the metric?

b) What other ways could it be done?

c) Why is this way the right or best way?

d) What do you think would be a better way?

Not only will you learn something about the metrics used to make decisions that impact the business, but also, as a leader, you will empower and entitle the process owner to own the metric too.  This is probably the most powerful tool in the arsenal for metrics integrity defense.

There are a great many things that can go wrong with our metrics, but we need them to help us make decisions and to assess our business and our processes.  By understanding how metrics can be deliberately or inadvertently manipulated, we can build better, more robust metrics systems and defend against mistaken perceptions.

Protect against easier data collection or simpler judgment calls causing valuable information from being lost or neglected.  Carefully define what it is the metric is looking for and measuring to ensure that you are truly gauging what is important.  Above all, get out and investigate the metrics systems personally.  Challenge them and also get a good look at the truth, which metrics can only indicate, not wholly foretell.

Try these tactics with your existing metrics.  Improve your metrics and metrics systems and enable your team to make more informed decisions as a result.

Stay wise, friends.

More in Operations