Enjoying Database Administration

20.1.09

So you think you're normal... Part 1 Normalism or Anomalous

Normality or Anomaly

Database normalization is an important aspect of database design, something that is generally done by a data architect, and not as often by a database administrator, much to my delight and chagrin. The rules of normalization are part of the package of database minutiae that I found so intriguing and delightful (pardon the hyperbole, but I find it apropos to style) as to cogitate upon that I decided to become a database administrator. They are of course, rules appropriate to database design, and therefore, not necessarily a business imperative. To be certain data anomalies cost wage hours to repair, and can complicate querying, as well add resource overhead to your database operation. Having justified the time and expense in normalizing your database (or more likely some section of it, perhaps in a new schema as your design has in some areas outgrown its table schema) you are ready to start making somewhat dramatic changes to your database for the better.

In my opinion one of the most important issues to be mindful of when beginning to normalize your design, besides backing up your database (which you should be doing anyway) and design documentation for the existing design, and of course that you should solidify and verify your designs before making any changes to the database, is that normalization should be considered in terms of tables first. Deciding whether or not your entire database, or simply an entire schema for that matter is in any particular normal form, is kind of like deciding whether the sky is a particular shade of cerulean or azure, the answer can, ad will often be, in some places yes, and in others no.

Normalization can be oversimplified by saying that it is breaking tables into more tables. When I look at normalization I usually look for the opportunity to break relations off of tables. But of course this is more of a high level scan for a particular table needing a more formal inquiry into its normality. At this stage I might say I suspect that this table is not fully normalized.

An important aspect of normalization is the minimization and elimination of the potential for data modification anomalies. I tend to think of normalizing a table in terms of these anomalies, as well as in terms of many-to-many relations that can be formed of the type Table1:Key1Key2:Table2, or Subject:Predicate:Object (A great metaphor for the real world relation represented).

The three most important modification anomalies are insert, update, and deletion anomalies. Each of these anomalies corresponds roughly to a particular normal form. The first normal form tends to eliminate update anomalies, the second normal form pertaining to insertion anomalies, and the third normal form deals with deletion anomalies. Of course, there is overlap between the normal forms and relevant anomalies.

An insertion anomaly can be avoided by allowing NULLS, this is to be discouraged, as doing so circumvents first form normalization and consequently prevents all other normal forms from being realized as each is dependent on the previous in a cascading fashion. That is to say that a table in the second normal form is in the first, and a table in the third is in the second, and so on.

In later posts I will expand on the themes introduced here.

No comments:

Post a Comment