Friday, May 03, 2013

Dimensionless Numbers - The Original Designed Experiment

I've ranted and raved in the past here and elsewhere about my dislike for "Designed Experiments", which are often abbreviated as DOE's (Design of Experiments).

First off, the name is meaningless, as just about all experiments are "designed" as the word is commonly defined, that is, they are "create[d], fashion[ed], execute[d], or construct[ed] according to plan" . There are accidental experiments, but most are designed, so to call this method of experimental design by the name "Designed Experiments" is not helpful in the least. (This is analogous to "Good Laboratory Practices", which every lab worker thinks they are following until they find out that it is a code word from the FDA for a whole slew of regulations.)

But more importantly, the lessons from a designed experiment are incapable of being transferred to another setting. As I've previously said,
"They give you no or little insight as to what underlying principals could be learned and used elsewhere - except to run another DOE. For instance, in a pressure-sensitive adhesive formulation DOE, you may see that adding more tackifier increases the tack. So marketing wants more tack? Then add more tackifier...until you suddenly see a decrease in tack. Now what?

The problem is that the underlying physics drive the results, not an artificial concept like tack. When you see that the tackifier is lowering the plateau modulus (hence more tack) but also increasing the Tg, you realize that you can overdo it. Raise the Tg too much and you have a tack-free material, plateau modulus be damned.

But a DOE will never educate you about this. It will only give directions based on what the inputs were. Garbage in, garbage out. There is a good reason you never see a DOE in Nature, JACS, or pretty much any scientific journal. You don't learn anything fundamental from them. And as soon as we give up our focus on the fundamentals, we are all done."

The overwhelming appeal of designed experiments is that they allow you to change multiple variables with each run and hence the overall number of runs. But engineers long ago learned how to get around that - use dimensionless numbers.

As an example, for fluid flow through a pipe you have to worry about the viscosity, the diameter of the pipe, the density of the fluid and its velocity. That's four variables and quite a bit of work in front of you. But a bright engineer named Reynolds found that if you multiply together the diameter, density and velocity, and then divide by the viscosity, the number is dimensionless. The Reynolds Number was born. The beauty of it is that you only need to worry about your results at various Reynolds numbers, and how have the freedom to change the input variables as you desire and in whatever way is easiest for you. You are in control of the experiments, not the "design" spit out by a computer.

Reynolds had insight into the fundamentals of his system through the Navier-Stokes Equations but such fundamental understanding is not needed. The Buckingham Π-Theorem allows you do derive dimensionless numbers from any of the potential variables for a system. There are dozens of similar dimensionless numbers, but the unfortunate part is that they are mostly used by engineers and seldom used by scientists, and hence, DOE's will continue to haunt us.

1 comment:

DigitalPig said...


I used to ignore DoE completely until one day I learned this in a Chem E class. Even I am shameful enough to say I am a chemist/scientist, I have totally different point of view to DoE as yours. I think DoE is a really good supplements to scientist, especially to industry scientist.

I totally agree with you that the ultimate guidance of your experiment should be the scientific rules behind. DoE *only* gives you the *empirical* relationship which most of time is highly limited to the experiment space you have explored. Any value outside the explore area (or extrapolation) is not reliable and you should always treat these as "prediction" and try to verify it by scientific rules behind or just one more experiment.

Yes. You are right that almost nobody in the university (in chemistry) is using DoE. That is because their system is much cleaner than we have as industrial chemists. Instead of a reaction system containing only reactants and catalyst, industrial chemistry contains a lot of other stuff that may have some interaction with each other enough to completely screw up your performance. This is very common if you are doing some formulation work. You are right that most scientific journals, like JACS/Angew Chemie, do not have a lot of papers that are using DoE. But that does not mean it is not useful. That may simply mean DoE is not needed in their system. To do some fundamental research, you definitely want a clean system. I wish I would have the similar system to work with in my normal work. But unfortunately, most of time I am dealing with a whole bunches of mixture that nobody knows if each of the components would have any interactions with anything.

I do not agree with you that the best thing DoE can do is to adjust many variables at the same time. That is only one benefit. As my point of view, those most powerful parts of DoE are 1) you can get any possible interactions between your factors. 2) You can introduce many uncontrollable factors during your experiment as block factors (for example, if humidity is a huge concern). 3) You have a confident interval, as well as significance level for your final result.

I don't want to talk 1) and 2) a lot but I really want to emphasize on 3). You think DoE is not useful because none of good scientific journal is using it. I think that actually is the problem for current scientific field. Statistic is a really powerful and quantitative subject which exists decades. Sadly, nature science fields, like chemistry and physics, are still slowly picking it up. I constantly saw papers publishing data without error bars. Even for those papers with error bars, they don't provide basic statistical analysis and discussion if their data is significant or not comparing to the baseline. Biological papers are better at this (you can seldomly find a paper without significance analysis). But they still have whole bunches of statistical issues, like inadequate replicates, biased on experimental data and inappropriate analysis method. And one common reason for this is because they are poor at DoE. When they don't use proper DoE, it is like you said, garbage in, garbage out. Without a proper DoE and good randomization, all your data is meaningless, even you throw them into whatever fancy statistical analysis program.

You cannot ignore DoE. But you only rely only on DoE either. It is only a supplement. But it is a powerful supplement. My point of view is scientist should learn and use some DoE and statistics, and statistician should learn some chemistry too (well, if they want, definitely). Then we can have a better communication about how to improve our scientific exploration more effective and more reliable by introducing the powerful statistical tools.