Monday, April 14, 2008

Successful Unversal Data Models?

A friend passed on a link to this Universal Data Models article, and I have to admit this is an issue about which I feel some conflict.

On the one hand, I respect the notion that, in the world of standardized relational databases and object-oriented programming, it'd be nice to be able to pick up standard models that represent clear and common items like people, companies, and the relationships between them.

On the other, I've wandered partially down this path before a few times and emerged with scars. While it's certainly true that many organizations have software that needs to track information about people, organizations, and the ways they're related, it's also true that there innumerable pieces of information one might want to record about people and organizations, and countless ways in which they can be related. In order to be truly universal, you'd have to capture all of those ways and distill it into a common data model.

Practically speaking, I find that universal data models fail in three important ways, and most models attempting to be universal suffer from at least one if not all three of these:

  • Too Generic: Because the model attempts to be flexible, to be useful in all kinds of scenarios, what you're left with is often too generic to be as useful as a data model tailored to your needs. For instance, if your company has one and only one account manager assigned to a particular client, the simplest way to represent that is something like a foreign key from a client to an account manager. In a universal model, this is might take the form of, say, a PERSON_ORGANIZATION relationship between an 'employee' person and a 'client' organization and where the PERSON_ORGANIZATION.ROLE is 'account manager', and you'll need application-level uniqueness checking to ensure there are no duplicates. You'll find yourself having to do extra work for the simplest things -- if a billing address is a requirement for any client, you can enforce that in your custom data model, but there's no way that your generic model can do that for you.
  • Too Much: The model has more information than you need. You don't have email addresses for your field employees, or you don't care about fax numbers, or perhaps your clients don't have departments and regions, or you don't care to track them.
  • Not Enough:
Often, each instance of those problems is pretty minor:
  • The universal model has a field you don't need? Ignore it, filter it with a view or don't map it to your domain objects.
  • The universal model doesn't have a field you need? They've kindly supplied a key-value table or custom_field_12 where you can put that sort of thing.
  • The model's too generic? You can simplify that by using a domain object that hides the complexity.
All those things are true, but they increase the pain and effort of using the model. Quickly, the cost of using a generic model exceeds its benefits, particularly if you're doing so for purely philosophical reasons like, "Wouldn't it be nice if we could all re-use a common idea of what a Company is?"

Worse still, these can introduce cognitive dissonance when you force-fit your own business needs into someone else's generic model - you can make it that much harder for your business members and development team to share a universal language about your domain and communicate effectively. When the business owner asks "Validate that when an account manager leaves the company, another account manager takes his place with that client", you have to do mental gymnastics to imagine what they mean, and those gymnastics can introduce errors, almost like transcription errors: "When a PERSON with PERSON.type='employee' and PERSON_ORGANIZATION.role='account manager' becomes PERSON.active=false, then add a new PERSON_ORGANIZATION record that associates another PERSON.type='employee' with PERSON_ORGANZATION.role='account manager' ... "

I'm curious how many people find themselves using universal models and are happy doing so.

No comments: