Nate Silver is America’s favorite statistical guru of the past – well, maybe ever. He has been devilishly accurate in predicting electoral outcomes. Before that, he joined the small but influential fraternity of statheads who work with data in sports, particularly baseball. He’s written an excellent book called The Signal and the Noise, which is essentially about the folly of prediction.
In his interview in the Freaknomics Radio, he offered very good insight about data analysis and a lot of misunderstanding people have today.
… Big Data … Oftentimes it’s not about the amount of data you have but how much you vetted that data. If a data set is virginal, as they call it, no one’s looked at it before really. You’re gonna have a lot of problems and one problem with a really large data set is that if you’re running some algorithms, some quick and dirty way to find the most influential data points, a lot of times those are bugs and outliers, right? And the reason why you have that anomaly is because someone coded it in wrong. Or you made some mistake in the analysis.
… I think people love new technology but they overestimate how much of the kind of human factor gets in the way. I’m not trying to be cute about that, I just mean that people need to learn how to use these tools, what they can do, what they can’t do, you know, no amount of data is a substitute for scientific inference and hypothesis testing, and kind of structured analysis of a system. I think one of the false promises that was made early on is that, well if you have a billion data points or a trillion data points, you’re going to find lots and lots of correlations through brute force. And you will, but the problem is that a high percentage of those, maybe the vast majority, are false correlations, are false positives. Where there could be significance, but you have so many lottery tickets when you can run an analysis on a trillion data points, that you’re going to have some one in million coincidences just by chance alone. If you bet all your money on them, you might wind up looking very foolish in the end.