John Owens , whom I much admire, made this comment to a Henrik Liliendahl Sørensen blog post about the difference between data quality and information quality. Taken completely out of context, he said:
“…it is the quality of the information (i.e. data in a context) that is the real issue, rather than the items of data themselves.
The data items might well be correct, but in the wrong context, thus negating the overall quality of the information, which is what the enterprises uses. It will be interesting to see how long it is before data quality industry arrives at this conclusion. But, if they ever do, who will be courageous enough to say so?”
I agree entirely, yet disagree profoundly. Data and information are not the same thing yet are inextricably linked – one without the other isn’t possible but they still must not be confused.
Data and information are as different as chickens and eggs, but are equally dependent upon each other.
Basically, data is stored information whilst information is perceived data.
Let me give you an example from a recent episode of the BBC’s science program Bang Goes the Theory. A presenter went to a shopping centre and prepared two plates of bacon sandwiches. One was accompanied with the message that regularly eating processed meats increases the chances of getting bowel cancer by 20%. The other was accompanied by the message that regularly eating processed meats increases the chances of getting bowel cancer from 5% to 6%. Though the data underlying both pieces of information is identical, as is the information provided, the audience were understandably worried when seeing the first message but happy to tuck in after seeing the second. The first message would be fit for the purposes of the health authority, the second for the bacon marketing board, but in neither case is the fitness for purposes related to the data - it is related to the information provision.
It is at the points where information becomes data and data becomes information that the potential for corruption and misunderstanding of the data and its perception are at their highest. We also know that once data is stored, inert though it may appear to be, it cannot be ignored as the real world entities to which the data refers may change, and that change needs to be processed to update the data.
Data quality ensures that the data represents its real world information entity accurately, completely and consistently. Information quality is working to ensure that the context in which the data is presented provides a realistic picture of the original information that the data is representing.
There’s a general feeling that only data which has a purpose should be stored. I would not agree as purpose, as with so much, depends on context and our viewpoint. Data which has no purpose now may be required to fulfil an information requirement in the future or be related to occurrences in the past; whilst for the people who are being paid to manage data, whether it is used or not, finds his or her salary is very meaningful!
Ultimately data is used to source information, and information quality is important. But we should not confuse the differences between data quality and information quality. Both are essential, and they are separate disciplines.