Friday, 30 December 2011

Scientific Publishing in XML, Repost

I was pointed to this blog post that, in turn, referred to this TEDx talk where Steven Bachrach said this:

"Scientific Publishing is essentially unchanged in 250 years"
"The way we publish today is destroying data"
This really struck a chord with me. And essentially, it applies to just about everyone handling their information in an unstructured format.

Thursday, 29 December 2011

Semantic Profiles

Following my earlier post on semantic documents, I've given the subject some thought. In fact, I wrote a paper on a related subject and submitted it to XML Prague for next year's conference. The paper wasn't accepted (in all fairness, the paper was off-topic for the themes for the event), but I think the concept is both important and useful.

Briefly, the paper is about profiling XML content. The basics are well known and very frequently used: you profile a node by placing a condition on it. That condition, expressed using an attribute, is then compared to a publishing context defined using a similar condition on the root. If met, the node is included; if not, the node is discarded.

The matching is done with a simple string comparison but the mechanism can be made a lot more advance by, say, imposing Boolean logic on the condition. You need to match something like A AND B AND NOT(C), or the node is discarded. Etc.

The problem is that in the real world, the conditions, the string values, usually represent actual product names or variants, or perhaps an intended reader category. They can be used not only for string matching but for including content inline by using the condition attribute contents as variable text: a product variant, expressed as a string in an attribute in an EMPTY element, can easily be expanded in the resulting publication to provide specific content to personalise the document.

Which is fine and well, until the product variant label or the product itself is changed and the documents need to be updated to reflect this. All kinds of annoyances result, from having to convert values in legacy documents to not being able to do so (because the change is not compatible with the existing documents). Think about it:

If you have a condition "A" and a number of legacy documents using that condition, and need to update the name of the product variant to "B", you need to update those existing documents accordingly, changing "A" to "B" everywhere. Problem is, someone owning the old product variant "A" now needs to accept documentation for a renamed product "B". It's done all the time but still causes confusion.

Or worse, if the change to "B" affects functionality and not just the name itself, you'll have to add "B" to the list of conditions instead of renaming "A", which in turn means that even if most of the existing documentation could be reused for both "A" and "B", it can't because there is no way to know. You'll have to add "B" whenever you need to include a node, old or new.

This, in my considered opinion, happens because of the following:
  • The name, the condition, is used directly, both as a condition and as a value.
  • Conditions are not version handled. If "B" is a new version of "A", then say so.
My solution? Use an abstraction layer. Define a semantic profile, a basic meaning for the condition, and version handle that profile, updating it when there is a change to the condition. The change could be a simple name change for the corresponding product but it could just as well be a change to the product's functionality. Doesn't really matter. A significant change will always requires a new version. Then, represent that semantic profile with a value used when publishing.

Since I like URNs, I think URNs are a terrific way to go. It's easy to define a suitable URN schema that includes versioning and use the URN string as the condition when filtering, but the URN's corresponding value as expanded content. In the paper, I suggest some simple ways to do this, including an out-of-line profiling mechanism that is pretty much what the XLink spec included years ago.

Using abstraction layers in profiling is hardly a new approach, then, but it's not being used, not to my knowledge, and I think it should. I fully intend to.

Evolution 3.2

Evolution 3.2 solved my Groupwise problems by eliminating Groupwise support altogether. It's an odd way to do it, considering that both originate from the same company, Novell. I am now left without a groupware solution for Linux.

In all fairness, mine is the unstable ("Sid") branch of Debian Linux, which means that the Groupwise library will likely be updated and re-included at some point. It's just that the functionality used to be one of the core advantages of Evolution and what brought me to it in the first place.

Every time I start to think that Linux is finally ready for the desktop, something happens.

Friday, 16 December 2011

I Spoke Too Soon

Turns out that Evolution can misbehave in Gnome 3.x, too. It just takes a little longer. Had a look at my calendar, just now, and noticed that the stupid thing had crashed.

Damn.

XML Prague 2012

There's going to be an XML Prague in 2012, and I'm going to be there, again. Already looking forward to it. Not enough XML geekery for me lately.

Evolution/KDE/Gnome Rant

I've been running Evolution as my email/calendar/groupware/etc solution in Debian and KDE 4.6 at work ever since I gave up on Windows for anything beyond PowerPoint presentations and such. In spite of the Novell Groupwise server misery that we are forced to live with at Condesign, Evolution does the job. I've actually managed to synch my mail and appointments with both my trusty N900 and an Android thingy that the company wants to be my primary work phone, and have been if not pleased then at least content with the situation.

I should add that using a KDE solution (KMail/Kontact) has never worked for me. I can't get Kontact to log in to the Groupwise server, no matter what.

Anyway, unfortunately a recent apt-get update did... something. I'm still able to read my email in Evolution but the calendar and address book both crash with a DBus error whenever I try to view or use them. The usual suspects, from deleting caches to looking for non-UTF-8 characters in calendar ICS files, do not seem to apply and upgrading or downgrading Evolution doesn't help either. The problem seems to be more fundamental.

Yesterday, however, I booted into Gnome rather than KDE, mostly because I was bored and wanted to see what Gnome 3.x is like. Thing is, for some inexplicable reason Evolution now runs without a hitch. Calendars, address lists, everything. No crashes, no DBus errors.

Now, I've used KDE for years, preferring it over Gnome because the latter always feels a bit patronising to me. Gnome is like a Linux equivalent to OSX, built on the assumption that users are all idiots and the inner workings-on of a computer should always be kept hidden so the user is not unnecessarily confused with anything even remotely technical.

Yet, OSX, for the most part, does the job. It just works, which I discovered recently when setting up a MacBook Pro for my daughter. It had no problem finding and configuring our home network HD and printer (tricky subjects for our Windows and Linux boxes, for some reason), and even displayed a nice image of the exact printer model to help me install it. Pretty cool, actually.

And this is what Gnome 3.x seems to focus on also, on just working. Yes, it feels a bit dumbed down, but it really seems to just work. I even think that I could learn to live with the 3.x GUI.

And I got my calendar back.