Friday 19 March 2010

XML for the Long Haul

There will be a one-day symposium on the theme XML for the Long Haul, right before the Balisage conference in Montréal this year. I've thought about this, lately.

First of all, isn't this what XML is about? The ability for information to survive a proprietary method of conserving it? The means to make it happen, regardless of what happens to your software? I've preached about this for a long time for my customers, listeners, and those who just couldn't get away. If a disaster happened to your software, if it was somehow wiped out in spite of your best efforts, my point was that it would only take a few days to build something that would parse most of the information in an XML file. Maybe another few days to produce output from it, but provided that you spoke the written language and the structure was done by someone who had at least a basic idea of what XML (and SGML; this isn't new) was about, it wouldn't take more than a few days at most to see what that lost information was about.

Second, my points re the first, above, pretty much summarise my views here, but I really mean it: This is what XML is about.

But is it really that simple? Is markup really that descriptive? Well, not always. There's plenty of markup out there that is obscure and hard to read. For example, is a namespace going to make your leftover instances easier to read? Are your element type names descriptive? What about your attributes? Do you include comments or annotations with your schema? Do you include wrappers that contain groups of element types in a semantically meaningful way? Does your group include everything required for that group to be complete? Have a look at one of your instances with fresh eyes, see if it makes sense. Does one type of information relate to another? How would you format this lost instance, if you had just come across it? If it had been a thousand years and you could understand the language but not the culture, would you understand the meaning of the information? Could you print it and explain what went on then?

Don't laugh. Pretend that you really are viewing your structures from the outside. Pretend that you don't have the schema at hand. Pretend that you don't know the semantics, even though you can understand the contents. Pretend that you really are studying the information as an outsider. Does it all make sense?

I think this is a worthwhile reality check. I think that we all should ask this of the schemas we create, every time we do an information analysis. Are our schemas understandable? Are they legible?

I would really like to be in Montréal in August this year. I think it's important.

No comments: