Winner and Looser of 2016 US Presidential Election

Well, this morning, everyone should already know the winner and looser of the 2016 US presidential election. So, based on the spirit of this blog, let’s look at the winner and looser in infographic and predictive analytics as a result of the 2016 US presidential election.

Winner: Google Election

Google made it so easy to understand election status and results in one simple page with tabs for overview, president, senate, house, .., etc. The bar between the two presidential candidates is so clear that we know who is winning and who is catching up, as well as how many electoral votes are needed to win.

google-dashboard

I especially like the semi-transparent red and blue to indicate the majority stakeholders by states and the status of swing states in the lower part of the dashboard. In contrast to what Google is doing, other media does not use the semi-transparent colors for the remaining states so it becomes less clear who would win the rest of the electoral votes and is difficult to see the trend.

nbc-dashboard

On the other hand, the bar between the candidates in the Google dashboard diffuses the bias introduced by the area chart of the US map (implying large areas having more electoral votes).

The only item I would add to the Google election dashboard is to apply the semi-transparent colors to the bar between candidates as well. This would make the dashboard perfect.

In summary, Google election dashboard does a excellent job for the US presidential election. It brings clarity to both the status and trend of the election results in a very precise manner. It deserves to be the winner of BI dashboard design for this election.

Looser: Predictive Analytics

Predictive analytics does a very poor job in this presidential election. All predictive models consistently say Mrs. Clinton would win the election over Mr. Trump. As we all know this morning, the prediction is a total failure.

538 is a pretty popular site about predictive analytics. The following image shows you its prediction during the night of the election results. The forecast had been in favor of Mrs. Clinton until 10 PM, when the curve started to switch to the favor of Mr. Trump. And, the switch did not become evident until 11:30 PM (at the big gap where red line above blue line in the lower part of the chart), when some people could already tell the trend before the forecast trend.

538-forecast

However, we probably should not blame predictive analytics for such big failure. It is because the strength of predictive analytics is to predict “major trend”, not a single outcome; and most of our predictive analytics today rely solely on “data” and nothing else.

As I pointed out in my recent blog on my site and KDnuggest site, if predictive analytics is purely based on data without understanding the underlying process, its forecast is subject to noise and bias in the data and could be very inaccurate. This becomes evident during this presidential election. Because all data were bias towards Mrs. Clinton, it predict Mrs. Clinton to win.

In addition, since the presidential election result is a single outcome and involves a lot of human factors which cannot be quantified analytically, predictive analytics may not be the right tool for the prediction at all! The totally failed prediction makes predictive analytics the looser of this election.

Predictive analytics still works well in a controlled context, but may not be the right tool for election prediction unless we are able to (1) quantify human factors and correlations accurately, (2) do not depend solely on data, and (3) fully disclose the prediction errors.

 

Advertisements

US court allows Google Earth image as evidence

An appeals court has ruled that Google Earth images, like photographs, can be used as evidence in a court.

… The Ninth Circuit said that in admitting the evidence it was joining other circuit courts that have held that machine statements aren’t hearsay. A machine could, however, malfunction, produce inconsistent results or have been tampered with. “But such concerns are addressed by the rules of authentication, not hearsay,” according to the court.

… In such authentication, the proponent of the evidence must show that a machine is reliable and correctly calibrated, and that the data put into the machine, in this case the GPS coordinates, is accurate.

Reference: US court allows Google Earth image as evidence

Google’s self-driving cars have been in 11 accidents because humans are dumb

Throughout all those miles – mostly on city streets – the vehicles have been involved in 11 ‘minor’ accidents, with no injuries. Make what you want of that number, but the key point is that none of the accidents so far have been the fault of the self-driving car, according to Google.

Instead, they’ve been a result of reckless driving. City streets are far more likely to be the scene of collisions than highways, and Google saw all sorts of distracted driving – ranging from the unfortunately common cellphone use, to one person even playing the trumpet.

Reference: Google’s self-driving cars have been in 11 accidents because humans are dumb

Google is funding “an artificial intelligence for data science”

As I have been stressing in the past, we can not rule out human in the equation of data science. This Google idea seems to try to automate another science by eliminating human. It is very ideology and could be just wasting money. I agree that, for the initial trial, we can still achieve some automation, especially the one brought up by the article, that is, selecting the optimal parameters in a machine learning by minimizing the output errors. But, again, the risk of the resulting “optimal” machine learning model not being optimal at all is still there (especially due to data bias, which a machine cannot detect, if not known).

Reference: Google is funding “an artificial intelligence for data science”

The Elephant was a Trojan Horse: On the Death of Map-Reduce at Google

Map-Reduce is on its way out. But we shouldn’t measure its importance in the number of bytes it crunches, but the fundamental shift in data processing architectures it helped popularise.

Reference: The Elephant was a Trojan Horse: On the Death of Map-Reduce at Google

World’s largest event dataset now publicly available in BigQuery

The blog comes from Kalev H. Leetaru, a fellow and adjunct faculty in the Edmund A. Walsh School of Foreign Service at Georgetown University in Washington DC. His award-winning work centers on the application of high performance computing and “big data” to grand challenge problems. The entire quarter-billion-record GDELT Event Database is now available as a public dataset in Google BigQuery.

The GDELT Project pushes the boundaries of “big data,” weighing in at over a quarter-billion rows with 59 fields for each record, spanning the geography of the entire planet, and covering a time horizon of more than 35 years. The GDELT Project is the largest open-access database on human society in existence. Its archives contain nearly 400M latitude/longitude geographic coordinates spanning over 12,900 days, making it one of the largest open-access spatio-temporal datasets as well.

Reference: World’s largest event dataset now publicly available in BigQuery 

Why Google Flu is a failure: the hubris of big data

People with the flu (the influenza virus, that is) will probably go online to find out how to treat it, or to search for other information about the flu. So Google decided to track such behavior, hoping it might be able to predict flu outbreaks even faster than traditional health authorities such as the Centers for Disease Control (CDC).

Instead, as the authors of a new article in Science explain, we got “big data hubris.” David Lazer and colleagues explain that:

“Big data hubris” is the often implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis.

Ironically, just a few months after announcing Google Flu, the world was hit with the 2009 swine flu pandemic, caused by a novel strain of H1N1 influenza. Google Flu missed it.

The failures have continued. As Lazer et al. show in their Science study, Google Flu was wrong for 100 out of 108 weeks since August 2011.

One problem is that Google’s scientists have never revealed what search terms they actually use to track the flu. A paper they published in 2011 declares that Google Flu does a great job. The official Google blog last October makes it appear that they do an almost perfect job predicting the flu for previous years.

Haven’t these guys been paying attention? It’s easy to predict the past.

Reference: Why Google Flu is a failure: the hubris of big data