World’s largest event dataset now publicly available in BigQuery

The blog comes from Kalev H. Leetaru, a fellow and adjunct faculty in the Edmund A. Walsh School of Foreign Service at Georgetown University in Washington DC. His award-winning work centers on the application of high performance computing and “big data” to grand challenge problems. The entire quarter-billion-record GDELT Event Database is now available as a public dataset in Google BigQuery.

The GDELT Project pushes the boundaries of “big data,” weighing in at over a quarter-billion rows with 59 fields for each record, spanning the geography of the entire planet, and covering a time horizon of more than 35 years. The GDELT Project is the largest open-access database on human society in existence. Its archives contain nearly 400M latitude/longitude geographic coordinates spanning over 12,900 days, making it one of the largest open-access spatio-temporal datasets as well.

Reference: World’s largest event dataset now publicly available in BigQuery 


100+ Interesting Data Sets for Statistics

Looking for interesting data sets? Here’s a list of more than 100 of the best stuff, from dolphin relationships to political campaign donations to death row prisoners.

Reference: 100+ Interesting Data Sets for Statistics

Sharing and dissemination of research data : Some useful methods and tools

As we reach the completion of research projects as part of ‘Exploring the Impacts of Open Data in Developing Countries’ (ODDC), these were some of the interesting questions that were reflected upon, in a monthly web meeting hosted by the Web foundation along with the research partners on 15th April 2014. Melanie Brunet (Research Librarian at IDRC) and Barbara Porrett (IM Systems Analyst, IDRC) talked about sharing and archiving research findings and reports in the IDRC system through the IDRC Digital Library (IDL). Lars Holm Nielsen (CERN) gave a very useful presentation on Zenodo, a research data archive to store and share research data. This post shares some of what was discussed.

Reference: Sharing and dissemination of research data : Some useful methods and tools

The best optical illusions of 2014 are truly mind-bending stuff

Each year, the Neural Correlate Society holds a contest among experts on perception and visual illusions. The top ten entries get selected by a panel, and the three best are then picked in an international gathering that I imagine as some sort of wizards convention. Here are are the winners.

Reference: The best optical illusions of 2014 are truly mind-bending stuff

Three visualization mistakes to avoid

… let’s take a look at a three examples that illustrate how visualizations can affect our perceptions of data.

  • Bar charts with erroneous scale
  • Pie charts with perplexing slices
  • Missing legends and mystery colors

Reference: Three visualization mistakes to avoid

Another good line chart from the White House

Here is another good infographic from the White House. The choice of colors and layout is very good.

Reference: A Look Back: Bringing Our Troops Home

Why Most People’s Charts & Graphs Look Like Crap

… if data isn’t properly visualized, it can do more damage than good. The wrong presentation can diminish the data’s message or, worse, misrepresent it entirely.

… It’s about presenting information in a way that is easy to understand and intuitive to navigate, making the viewer do as little legwork as possible. Of course, not all designers are data visualization experts, which is why much of the visual content we see is, well, less than stellar. Here are 10 data visualization mistakes you’re probably making and the quick fixes to remedy them.

  • Misordering Pie Segments … Segments should be ordered intuitively (and they shouldn’t include more than five segments).
  • Using Non-Solid Lines in a Line Chart
  • Arranging Data Non-Intuitively
  • Obscuring Your Data
  • Making the Reader Do More Work
  • Misrepresenting Data
  • Using Different Colors on a Heat Map … use a single color with varying shades or a spectrum between two analogous colors to show intensity.
  • Making Bars Too Wide or Too Thin … The space between bars in a bar chart should be ½ bar width.
  • Making it Hard to Compare Data
  • Using 3D Charts … 3D shapes can distort perception and therefore skew data. Stick with 2D shapes to ensure data is presented accurately.

Reference: Why Most People’s Charts & Graphs Look Like Crap