Author: Tim Akerman
A behaviour I suspect many lean six sigma mentors have seen with new belts is paralysis by analysis. Newly qualified, with access to powerful statistical analysis software and on their own for the first time, their first reaction is to conduct every statistical test they can think of, working on the principle that they are looking at the data from every perspective. What they are actually doing is in part showing off their new found skills and in part showing off their unconscious incompetence. That is not to say they are incapable, only that they lack experience.
When mentoring new belts I always start them out with three simple questions.
The reason for these questions is to make sure that they learn what I believe is one of the most important skills in lean six sigma; focus. I have noticed over many years that when belts who are new to statistical analysis gain access to a powerful statistical analysis package.
So back to the questions. The first question is this;
1. Can you write a simple statement of what you want to know?
It may seem obvious, but often people forget the first discipline of six sigma – DEFINE.
Starting analysis without actually stating what you want to know leads to confusion. It is all too easy to conduct a series of statistical tests then when the results are available find you can’t remember what you originally set out to discover. Let’s be honest, most of us have done it and had to start again, that’s how we learned not to do it.
It is for this reason that I tell anyone starting to do statistical analysis, do nothing until you can write a simple statement of what you are trying to discover. If you don’t know what difference or correlation you are trying to discover, how can you possibly choose a suitable test? We all suffer from a cognitive bias that makes it easy to believe we know what is required. However, if we can’t write it down simply, and in plain language, do we really know what we are trying to discover?
Having written down our question in plain language we need a way to answer our question. This leads to the second question;
2. How will this test answer the question posed above?
There must be a direct link between the analysis undertaken and the purpose of the test. For example if we want to know if changing a pigment gives the same colour for a particular application, we should consider how the testing has been done. If the tests are done side by side in a laboratory on the same piece of substrate and the results are normally distributed without outliers, a paired t-test would be appropriate. However if the testing occurs in different factories on different batches of substrate and the results are not normally distributed with outliers, Moods median test should be used.
The context of the data has to be considered when deciding which test to use. Again, the test selection and the logic for selecting the test should be written down in plain language. If you cannot do that, you have not adequately considered your test selection and should revisit your thought process.
So now we have a clear picture of what we want to know and what test should be done to answer that question. What else is required?
3. Write down the rules for interpreting the test.
It is vital that the rules for interpreting results are written down before the analysis is done. If an alpha level of 0.05 is selected and the p-value from the test result is 0.93, the test fails.
Remember the cognitive bias from the problem definition? It appears here again; if we write the p value after conducting the test, we may decide that an alpha level of 0.1 is adequate. The interpretation of the test is different, because our decision making has been influenced by the results, the test acceptance hers hold is no longer objective. For example, if failure resulted in an expensive process change in a business with limited finance, going back to the pigment example, if the new pigment is lower cost it would be easier to accept a larger difference to push the change through. That may not satisfy the customers’ needs and may result in higher complaints and potentially higher costs in the long term. the combination of a desire to save money and an apparently small difference in performance will have seduced the operator into unconsciously compromising their standards.
How should the decision criteria be documented? That is the whole point and purpose of null and alternate hypothesis, but that is for another time.
Following these three simple rules will ensure clarity of purpose, that there is a rational link between the desired information and technique applied and that the pass / fail criteria are set objectively. Following these simple rules for data analysis will save a lot of time and help the practitioners to become confident and productive in a shorter time.