Tuesday, April 3, 2012

Valid vs Accurate

If I were to tell you to guess my electric bill from the following two choices, $146.78 or $150.00, which would you chose? I'd bet the a majority of readers would chose the former rather than the latter because it wasn't a 'round' number, and what are the odds that an electric bill would be a round number like 150. Of course the truth is my electric bill is just as likely to any number with a reasonable range as any other, but that 150 doesn't look right, why, because it looks inaccurate. Why does it look inaccurate, well because it doesn't appear valid. But in reality you cant assume a value is accurate because it appears valid.  In fact, you you need to separate the two concepts from each other when you look at data.

Validity in terms of data, means that it fits fits the definition for a particular type of data, for instance, in the case of US dollars it is a decimal number with two places of precision after the decimal point, an US phone number with area code is minimally a 10 digit integer. The other notable  property of data validity is that it can be checked with a self contained rule, there is no need to compare it to outside data.

Accuracy, on the other hand, means the data is correct. For instance, the actual value of my electric bill. Accuracy has to be checked by comparison with an outside value, so accuracy checks cannot be self contained. While accurate data is usually valid it does not have to be. If my actual electric bill was 150, it could conceivably presented  as in integer instead of a decimal thus making it invalid.

When dealing with data, make sure that you deal with validity first, followed by accuracy. This is especially true with weakly typed languages and variant data types.

No comments:

Post a Comment