Wednesday, February 23, 2011

Data quality and information perception

One of my mantras for ensuring data quality is to look at the data. Not at profiles or analyses or graphics but at the raw data. Browse through it and the best data quality tool that there is - your brain - will highlight data quality issues in seconds.

Information is rooted in, and derived from, data. When information is based upon data which is poor in quality the information will mislead.

This was highlighted by an attempt to bring together opinions about the current tensions in North Africa and the Middle East from blogs, feeds, social media and so on in a graphic form on The Guardian website (here).



We see Muammar Gaddafi there, nice and large, and .... but hang on a second. There's Moammar Gadhafi too. And Muammar Qaddafi. And then just plain old Gaddafi. The same person represented in four different places on the graphic because of transliteration issues. If these 4 entries had been brought together in a single place the graphic would look different.

As I mentioned, this issue is mainly caused by different transliterations of a person's name from Arabic, but this sort of variance within data is very common. Place names, for example, are often found written in many different ways within the same databases. Basing decisions on such variant data would be unwise. Yet decisions are based on data like this, from profiles and graphics like this, every minute of every day.

Go on, grab a coffee and go and look at your data. You'll be amazed.

No comments: