Archival Environment, Part 5 Archival Relationships

Originally published March 15, 2007

For many good reasons, archival data is best placed in a series of time vaults as it passes from the active environment to the archival environment. By placing archival data in a time vault –

  • It is freed from its base technology and the different releases of the different technology,
  • It is able to be accessed in the future in a simple manner
  • It has its metadata carried with it, so that all the archival data that is needed is placed in the same physical unit.

But there is something important that is lost when archival data is placed in a time vault as suggested. The important aspect of archival data that is lost is that of data relationships.

The simplest of data relationships – of one attribute of data relating to another attribute in the same record – probably needn’t be lost as archival data is created. In this case, one attribute relates to another attribute in the same record. The entire record can be placed in the time vault and the data relationship remains intact.

But there are many other relationships between data that go well beyond this simple case. One record can relate to another record in the same table. One record can relate to a record in a different table. There may be a recursive relationship of data, and so forth.

Unfortunately, these more complex relationships are not so easily treated as the simple, inter-record relationships as data is placed in the archival environment.

The first reason why complex relationships do not go well with the archival environment is that the archival environment is stripped of its technology shell. In other words, as data enters the archival environment, the technology that holds the data is greatly simplified. In simplification, the technology structure and conventions for holding complex relationships is often lost.

A second reason why archival relationships are lost is that archival data is placed in self-contained data vaults. Trying to build and maintain a data relationship across multiple data vaults can be done, but this is not advised. Over time, it is a good bet that two or more data vaults become separated from each other. When this separation occurs, the data relationship is lost.

A third reason why complex data relationships are difficult to maintain in the archival environment is that there is so much data found in the archival environment. In the face of huge amounts of data, reconciling broken relationships becomes really difficult.

The only realistic way that archival data can be adorned with complex data relationships is to bury the data relationship inside a single data vault. When a complex relationship is buried inside a data vault, the particulars of the data relationship are placed inside the data vault along with other data. Suppose there is a relationship between a sale and a customer. The data vault for the sale would contain customer number, customer name, and other customer information. This design technique has the effect of denormalizing data, but it makes the data vault self-contained and self-reliant.

In addition to placing data inside a data vault, the metadata describing the relationship should be placed inside the data vault as well. 

Of course, relationship data can be placed in more than one data vault. In the example discussed customer data is placed inside the sales data vault. However, there is no reason why sales data cannot be placed in the data vault for customer data.

In such a fashion, data can be archived with a complex relationship intact. There is, of course, some degree of redundancy. But the data can survive intact over a long period of time, thus setting the stage for long-term usability of the data.

  • Bill InmonBill Inmon

    Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.

    Editor's Note: More articles, resources and events are available in Bill's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Bill Inmon

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!