Wednesday, May 26, 2010

How to Not Plot Your Data

I only saw two talks while at ANTEC last week, one of which I was appalled by. In this study, different variables were measured which I'll call 'X' and 'Y'. Without any theory or rationale behind his analysis, he prepared a number of plots that showed various correlations, using semi-log or log-log plots as needed to get strong correlations. Do this long enough with any data set and you will find eventually find something that looks good but is useless. You're not learning anything, you're not disproving anything, you're just doing busy work. However, one plot in particular left me aghast. The plot had the ratio X/Y on the y-axis, and X on the x-axis. ???? How can this be? Divide both sides by 'X' and you then have a plot of 1/Y vs. 1 - the equivalent of a bar chart of the various values of 'Y' that were measured. I walked out at that point.

It certainly would not be out of place to blame the reviewer of the paper, but I hold the author to an even higher standard: the bio of the speaker stated that he had earned a Ph.D. from one of the finer schools in the land, proof that the degree only shows what you once did, not what you are still doing.

4 comments:

Materialist said...

A friction coefficient vs. load plot would appear to be just as useless (Friction Force/Load vs. Load) but is generally recognized as a valid form of data presentation.
Not to say that the presenter in question wasn't grasping for relevance via plot-roulette, but sometimes a form of data analysis can seem silly but still be useful.

John said...

"Plot roulette" I'll have to remember that term.

Anonymous said...

Most people would agree a complex viscosity (G*/w) vs. frequency (w) plot is perfectly legitimate. Similarly, a plot of complex viscosity (G*/w) vs. complex modulus (G*) is not meaningless, but in fact can be used to estimate the flow curve.

John said...

Both the plots of "friction force/load vs. load" and "G*/w vs/ w" are different that what I was commenting on - "X/Y vs. X" That "X" appears in both denominators means it is redundant.