Don’t just move the average, understand the spread

Picture the scene; someone in your organisation comes up with a cost-saving idea. If we move the process mean to the lower limit, we can save £’000’s and still be in specification. The technical team doesn’t like it, but they can’t come up with a reason other than “it’ll cause problems”, the finance director loves the idea, and the production manager with one eye on costs says, well if we can save money and be in spec, what’s the problem?

Let me help you. 
In this scenario, the technical team may be right. If we assume that your process is in control and produces items with a normal distribution (remember that is the best case scenario!) logic dictates that half of your data is below the average value and half is above. That being the case, what you really want to know is how far from the average the distribution spreads. If the spread is large and you change process to the extreme where the average value sits right on the customer specification limit, half of everything you make will be out of spec. Can you afford a 50% failure rate? What will the impact be on your customers, your reputation, your workload (dealing with complaints). 

To work out how much we can move the process, we must first understand how much it varies, and we use a statistical value called the standard deviation to help us. Standard deviation is the average variation from the mean for a sample data set. To work it out, take 20 samples, measure them all 5 times then use a spreadsheet to work out the mean and standard deviation. If that is too much take 10 samples and measure 3 times. Keep in mind that the smaller sample size will give a larger standard deviation. Now take the mean and add 3 x standard deviation. This is the upper limit of your process spread. Subtract 3 x the standard deviation from the process mean to find the lower limit of your process spread. The difference between these two numbers is the spread of your process and will contain 99.7% of the results measured from the process output IF the process is in control and nothing changes.

If moving the mean takes the 3 standard deviation limits of your process outside of the specification, you will get complaints. It could be that the limits are already outside of the specification, in which case moving the average will make a bad situation worse.

It is possible to calculate the proportion of failures likely from a change of average, this done using z-score calculation. I’m not aiming to teach maths, so the important message is that the failure rate can be calculated.

This is the tip of the iceberg with understanding your process. If you don’t know that your process is stable and in control, the spread won’t help you because the process can jump erratically. To improve your process

1. Gain control, make sure the process is stable.

2. Eliminate errors and waste

3. Reduce variation

4. Monitor the process to make sure it stays that way.

The most significant and profitable gains are often from process stability, not from cost-cutting. All cost-cutting does is reduce the pain, think of cost-cutting as a painkiller when you have an infection. It makes it hurt less, but doesn’t stop the infection. You need to stop the infection to feel better.

Now do you want to hurt less or do you want to get better?

Why does variation and the type of variation matter?

Everything varies. We know it happens, and if you can’t see it, the variation may not be that significant to your process. However, it may be that your measurement systems are incapable of detecting significant variation that is important to your process, more about that in another post. Variation leads to production problems, waste and ultimately quality and delivery problems. Control the variation, you control the waste and costs. If waste and costs are a problem in your business, you may be interested in reading on.

There are two types of variation, common cause and special cause. Common cause variation is natural, characteristic of the process and most importantly, predictable. Special cause variation is caused by external factors acting on the process and is not predictable. This is an important distinction because the methodologies for investigating special and common cause variation are different, and if you investigate the wrong sort of variation it can waste a huge amount of time and cause frustration.

Take the process shown above. Just creating a graph of the data isn’t really useful, since it is unclear what should be investigated, or how to proceed. Typically a manager will look at a trend line to see if the process data is trending up or down. If the process is in control and (often) a manager observes an undesirable deviation from target, it is common to ask for that to be investigated. If the investigation focuses on special cause variation which is likely, since the investigator is likely to assume something is “wrong” therefore there must be a root cause. In businesses that do not use process control charts, there is no objective assessment of process performance before launching into seeking the root cause. The problem this creates is that there may not be a root cause. If common cause variation is at work, it is a fruitless exercise.

Where a root cause analysis finds nothing, managers can assume that the investigation is flawed and demand more work to identify the root cause. At this point willing workers are perplexed, nothing they look at can explain what they have seen. Eventually, the pressure leads to the willing worker picking the most likely “cause” and ascribing the failure to this cause. Success! The manager is happy and “corrective action” is taken. The problem is that system tampering will increase the variability in the system, making failures more likely.

The danger is then clear, if we investigate common cause variation using special cause techniques, we can increase variation through system tampering.

What then of the reverse, chasing common cause corrections for special cause variation. The basic performance of the process is unlikely to change, and every time there is a perceived “breakthrough” in performance, as soon as the special cause happens again the process exhibits more variation. The process does not see an increase in variation however, neither is there any improvement in the variation.


The only way to determine if the process is in control, or if a significant process change has occurred is to look at the data in a control chart. Using a control chart we can see which variation should be investigated as a special cause, and where we should seek variation reduction. In this example, the only result that should be investigated is result 8. This is a special cause and will have a specific reason. Eliminate the root cause of that and the process is in normal control. Everything else appears to be in control. Analysing the process data in this way leads to a focused investigation. If after removal of the special cause the process limits are inconsistent with the customer specification, variation reduction efforts should focus on common cause variation.

Why does the type of variation matter?

Everything varies. We know it happens, and if you can’t see it, the variation may not be that significant to your process. However, it may be that your measurement systems are incapable of detecting significant variation that is important to your process, more aout that in another post. Variation leads to production problems, waste and ultimately quality and delivery problems. Control the variation, you control the waste and costs. If waste and costs are a problem in your business, you may be interested in reading on.

There are two types of variation, common cause and special cause. Common cause variation is natural, characteristic of the process and most importantly, predictable. Special cause variation is caused by external factors acting on the process and is not predicable. This is an important distinction, because the methodologies for investigating special and common cause variation are different, and if you investigate the wrong sort of variation it can waste a huge amount of time and cause frustration.

Time series plot of reading

Take the process shown above. Just creating a graph of the data isn’t really useful, since it is unclear what should be investigated, or how to proceed. Typically a manager will look at a trend line to see if the process data is trending up or down. If the process is in control and (often) a manager observes an undesirable deviation from target, it is common to ask for that to be investigated. If the investigation focuses on special cause variation which is likely, since the investigator is likely to assume something is “wrong” therefore there must be a root cause. In businesses that do not use process control charts, there is no objective assessment of process performance before launching into seeking the root cause. The problem this creates is that there may not be a root cause. If common cause variation is at work, it is a fruitless exercise.

Where a root cause analysis finds nothing, managers can assume that the investigation is flawed and demand more work to identify the root cause. At this point willing workers are perplexed, nothing they look at can explain what they have seen. Eventually, the pressure leads to the willing worker picking the most likely “cause” and ascribing the failure to this cause. Success! The manager is happy and “corrective action” is taken. The problem is that system tampering will increase the variability in the system, making failures more likely.

The danger is then clear, if we investigate common cause variation using special cause techniques, we can increase variation through system tampering.

What then of the reverse, chasing common cause corrections for special cause variation. The basic performance of the process is unlikely to change, and every time there is a perceived “breakthrough” in performance, as soon as the special cause happens again the process exhibits more variation. The process does not see an increase in variation, however neither is there any improvement in the variation.

Control chart

The only way to determine if the process is in control, or if a significant process change has occurred is to look at the data in a control chart. Using a control chart we can see which variation should be investigated as a special cause, and where we should seek variation reduction. In this example, the only result that should be investigated is result 8. This is a special cause and will have a specific reason. Eliminate the root cause of that and the process is in normal control. Everything else appears to be in control. Analysing the process data in this way leads to a focused investigation. If after removal of the special cause the process limits are inconsistent with the customer specification, variation reduction efforts should focus on common cause variation.

If you are interested in understanding more about variation and how it affects your process, please get in touch or visit me on stand C23 at the E3 Business Expo on 3rd April. Details can be found at https://www.1eventsmedia.co.uk/e3businessexpo/blog/2019/01/13/visitor-registrations-now-open-for-e3-business-expo-2019/

Consulting

Do I have numbers, data, or information?

Nearly everything we do generates numbers. Everyone feels comfortable with numbers because they give a perceived absolute measure of what is “right” and what is “wrong”. But what do the numbers really mean?

Numbers generated electronically also seem to generate automatic trust. If it comes from a calculator or computer, it must be right. We must learn to question numbers and information regardless of source, the computer or calculator is only as good as its programming.

Caveat emptor, buyer beware, has never been more relevant than when dealing with electronically generated numbers.

There is a difference between having numbers, having data, and having information.

Numbers can feel secure but may not be useful. For example, if the number that is used is either not related to the core process or is measured inaccurately, then the numbers may be misleading.

Numbers are generated from a process, perhaps a test. It is important to understand how the sample was selected and if the sample size is appropriate. If a test method is used, it is vital that we measure what is relevant and that the results don’t depend on the person doing the test or when it is done. Finally, we must ensure that we understand how much variability is acceptable. Without knowing this, we can’t judge the risk we are taking when using the number to make a decision.

One example that has been experienced in print is coefficient of friction measurements from a supplier. The numbers presented appeared to be very consistent and in good process control, however when the method was examined using Gage r&R techniques it was found that the measurements could not even produce a consistent number for one batch of material. The confidence interval was so wide that there was a single category with category limits wider than the specification tolerance.

The result was random system tampering based on whether the number generated was inside or outside of the specification limits. There was no recognition that the test was inadequate for the specification limits applied.

It is critical to question the basis of numbers; failing to do this can result in taking the wrong action for the right reasons. All numbers we deal with should be checked to ensure that they are valid and reliable;

  • Valid means that the test measures a relevant parameter.
    If we are interested in liquid viscosity, taking a measurement set against an arbitrary reference point won’t work. It is necessary to measure the flow rate of the liquid, but liquids can have different reactions to how they are moved, usually referred to as shear. Most liquids get thinner, some don’t change, but a few liquids get thicker when you try to move them. One example of a viscosity measurement is the use of Zahn cups. If you have never heard of this, it is a small cup with a hole in the bottom and a handle on top. The time taken for liquid to flow out of the hole is related to liquid viscosity. They give an indication of flow, but absolute figures are hard to specify, since the tolerance of a cup is ±5%, and the method may not reflect the shear behaviour of the process. The viscosity also varies with temperature, so their use is more difficult than it may appear.
  • Reliable means that the test delivers the same number regardless of the operator or time of day.
    Taking the Zahn cup example again, different operators will time the stream differently, one will stop at the first appearance of drops in the stream, another will wait until there are drops from the cup itself.

Failing to adequately understand the methodology generating the numbers and the sources of variance (error) can lead to poor decisions and mistakes. For example, am I measuring real changes that I am interested in, or changes in operator or the test method, which are more likely to mislead me?

Once the basis of the numbers is understood, the stream of numbers then becomes reliable. The next step is to create data. This can be dangerous ground if numbers are used out of context. Distilling numbers to a single number, for example a compound statistic such as average and making judgements on that number can lead to problems, if the degree of variation is not understood.

William Scherkenbach has observed that

“We live in a world filled with variation – and yet there is very little recognition or understanding of variation”

Another way to look at it is this; if my head is in the oven and my feet are in the freezer, my average temperature may well be 20°C, but am I happy?

This is what makes single numbers dangerous.  A cosmetic product recently advertised 80% approval based on a sample of 51 users. Using statistical analysis, the true range of approval is somewhere between 67% and 90%. That’s not quite the same message, is it?

If there is no concept of variation, it is possible to either make an incorrect decision based on inadequate data or waste massive amounts of time and resource trying to address perceived “good” or “bad” numbers by looking for one off causes that are just normal variation. One way of understanding if the variation we observe is common cause or special cause is to use a control chart. There are several statistical signals that alert a knowledgeable operator that something is changing in the process.

There is a significant risk in chasing special cause variation with common cause approaches and vice-versa. Most often a number will be deemed too high or too low arbitrarily, because someone “knows what good looks like”. The danger when using common cause solutions for special cause variation is that the system will vary randomly and without warning. This will create confusion and time will be lost in trying to recreate the conditions as a controlled part of the system. Over time the conditions giving rise to the observed variation will conflict, since the test parameters do not control the variation.

Using special cause solutions for common cause variation will result in arbitrary attribution of cause because no rational cause can be found. Therefore, the first change that shows any sign of improvement becomes the root cause.

What needs to be done to give numbers and data meaning?

Numbers only have meaning in context. This means that it is necessary to consider two features of any data – location and dispersion. The representation of these two measure most commonly used are average and standard deviation. Even her it is important to have clarity about what the average is intended to demonstrate and why we are interested in the average.

We all know what the average is don’t we? Most people will add all the numbers up divide by the count of the numbers. This is called the mean. There are however two other versions of the average, the median – the middle number of a series, and the mode – the most common number in a series.

Why does it matter if I use mean instead of median for example? What difference does it make?

Mean is fine if the data is normally distributed, however if the data is not normally distributed there can be a significant difference between the arithmetic mean and the median, or middle number. Extreme values can significantly skew the average and mislead the user about the true position of the average. It is for this reason that one must understand the data distribution before deciding whether to use mean or median data. If one is looking for the most common occurrence, then clearly the mode is relevant.

How much variation is there?

The standard deviation is the average difference between the data and the average of all the data. In other words, how close to the average is each data point. The bigger the standard deviation, the more uncertain we are about the average. The standard deviation uses a root mean square function, so as the number of data points increases the standard deviation will reduce. If your result is too variable, take a larger sample.

Using only the mean, it could be tempting to move the mean of a process to the specification limit, perhaps to maximise profit. This can be fine, if the distribution is narrow and tall, moving closer to the limit will not result in failures. However, if the distribution is broader and flatter, it may be that the distribution only just fits in the tolerance window. If this is the case, moving the mean to the limit will result in a significant proportion of failures. It is possible that more than half of the product could fail, particularly if the sample size means the window for the true mean is large. If the true mean could fall outside of the specification, you are taking a huge risk!

This is because the average is not an absolute. The only way to have an absolute average is to have all possible data points – the population. This is often impossible, for example if the testing is destructive so we take a sample. We need to understand how close our sample standard deviation is to the population standard deviation. From the mean and standard deviation information, we can calculate a third parameter, the confidence interval for the average. The confidence interval expresses the range within which we would expect the true population mean to lie. Most statistics use a 95% confidence level, this means that 95% of the samples taken under the same conditions would give a result within this range.

These three pieces of information convert the number to information by considering in context of all the data.

How does this help us to understand?

Now it is possible to see

  • The location of the average
  • How closely can we predict the population average from the sample average
  • The location of a data point
    • how close is each number to the average?
  • The dispersion of a data point
    • how variable are the numbers that make up the data point?

With this information, it is possible to use the data to create information which can be used to make decisions.

For example, it is possible to determine if a number is within the normal spread of values for a measurement parameter. If it is, why search for some special meaning, if it is not, we can ask what happened?

In conclusion then using numerically based information to make decisions is a good thing to do. However, before we use the numbers we must be certain the numbers are valid and reliable. Then we can provide a framework of context and we can provide limits inferred from the data itself.

The final step is to apply logic and reason to the patterns revealed to create actionable information from the data. This involves application of the scientific method. Management theory teaches that data driven decision making is the only rational way to make decisions.

A quote from W. Edwards Deming seems appropriate at this point

“If you do not know how to ask the right question, you discover nothing.”

It doesn’t matter how good your numbers, how well those numbers have been converted to data, you will not gain information if you don’t know what question to ask.

Hypothesis testing

Demonstration with a confidence level of 95% that a hypothesis is true or false is not perfect, but it does provide a better mechanism for decision making than superstitious belief or gut feel.

W.E.B. Du Bois has probably stated the truth of data based decision making most eloquently

“When you have mastered numbers, you will in fact no longer be reading numbers, any more than you read words when reading books. You will be reading meanings.”

Next time you are looking at information to decide a course of action, make sure it demonstrates what you intended. Is your decision-making process based on numbers, data, or information? If your decision-making process uses numerical information, do you know what questions to ask? Ask how the data was sampled, what was the sample size? Is it appropriate?

There is an interaction here with one of my previous blog posts, Three simple questions.

Next time you have a decision to make define your objective clearly, question wisely, obtain relevant valid and reliable information, and use this as a basis for making sound decisions.