Sunday, April 29, 2012

6 Reasons to use a SQL Database

There is a lot of hype about NoSQL databases. These databases provide solutions to problems that traditional SQL RDBMS's can't, and they are 'new and shiny'. While these new tools are drawing a lot of attention, I'd like to review the case for the traditional SQL RDBMS.

  1. They abstract data the way that we would store them without  a database. A ledger is like a table, an invoice is  a one-to-many relationship. They are easy to understand and apply to real life situations.
  2. They are structured. Since each row in a table has the same structure, you only have to write one 'piece' of code to deal with it. Less code means faster development times, and less bugs.
  3. Static Data Types. Each column has a static data type (Or should anyway) so programs don't have to worry about data validation when retrieving data. This means less code too.
  4. SQL. They all run a dialect of SQL which cuts down on the programming learning curve.
  5. They are reliable. ACID transaction handling means that there is less chance of data corruption. And we all know corruption is bad.
  6. They are mature. Oracle and DB2 have been around for over 30 years, SQL Server for more than 20, PosgressSQL more than 15. Most of the bugs are out of the system, and there is a lot of documentation available. 

Don't jump into the NoSQL pool, unless you need too. SQL RDBMS's still provide the an excellent solution to most data storage and retrieval issues.

Tuesday, April 3, 2012

Valid vs Accurate

If I were to tell you to guess my electric bill from the following two choices, $146.78 or $150.00, which would you chose? I'd bet the a majority of readers would chose the former rather than the latter because it wasn't a 'round' number, and what are the odds that an electric bill would be a round number like 150. Of course the truth is my electric bill is just as likely to any number with a reasonable range as any other, but that 150 doesn't look right, why, because it looks inaccurate. Why does it look inaccurate, well because it doesn't appear valid. But in reality you cant assume a value is accurate because it appears valid.  In fact, you you need to separate the two concepts from each other when you look at data.

Validity in terms of data, means that it fits fits the definition for a particular type of data, for instance, in the case of US dollars it is a decimal number with two places of precision after the decimal point, an US phone number with area code is minimally a 10 digit integer. The other notable  property of data validity is that it can be checked with a self contained rule, there is no need to compare it to outside data.

Accuracy, on the other hand, means the data is correct. For instance, the actual value of my electric bill. Accuracy has to be checked by comparison with an outside value, so accuracy checks cannot be self contained. While accurate data is usually valid it does not have to be. If my actual electric bill was 150, it could conceivably presented  as in integer instead of a decimal thus making it invalid.

When dealing with data, make sure that you deal with validity first, followed by accuracy. This is especially true with weakly typed languages and variant data types.