Friday 30 December 2011

Scientific Publishing in XML, Repost

I was pointed to this blog post that, in turn, referred to this TEDx talk where Steven Bachrach said this:

"Scientific Publishing is essentially unchanged in 250 years"
"The way we publish today is destroying data"
This really struck a chord with me. And essentially, it applies to just about everyone handling their information in an unstructured format.

Thursday 29 December 2011

Semantic Profiles

Following my earlier post on semantic documents, I've given the subject some thought. In fact, I wrote a paper on a related subject and submitted it to XML Prague for next year's conference. The paper wasn't accepted (in all fairness, the paper was off-topic for the themes for the event), but I think the concept is both important and useful.

Briefly, the paper is about profiling XML content. The basics are well known and very frequently used: you profile a node by placing a condition on it. That condition, expressed using an attribute, is then compared to a publishing context defined using a similar condition on the root. If met, the node is included; if not, the node is discarded.

The matching is done with a simple string comparison but the mechanism can be made a lot more advance by, say, imposing Boolean logic on the condition. You need to match something like A AND B AND NOT(C), or the node is discarded. Etc.

The problem is that in the real world, the conditions, the string values, usually represent actual product names or variants, or perhaps an intended reader category. They can be used not only for string matching but for including content inline by using the condition attribute contents as variable text: a product variant, expressed as a string in an attribute in an EMPTY element, can easily be expanded in the resulting publication to provide specific content to personalise the document.

Which is fine and well, until the product variant label or the product itself is changed and the documents need to be updated to reflect this. All kinds of annoyances result, from having to convert values in legacy documents to not being able to do so (because the change is not compatible with the existing documents). Think about it:

If you have a condition "A" and a number of legacy documents using that condition, and need to update the name of the product variant to "B", you need to update those existing documents accordingly, changing "A" to "B" everywhere. Problem is, someone owning the old product variant "A" now needs to accept documentation for a renamed product "B". It's done all the time but still causes confusion.

Or worse, if the change to "B" affects functionality and not just the name itself, you'll have to add "B" to the list of conditions instead of renaming "A", which in turn means that even if most of the existing documentation could be reused for both "A" and "B", it can't because there is no way to know. You'll have to add "B" whenever you need to include a node, old or new.

This, in my considered opinion, happens because of the following:
  • The name, the condition, is used directly, both as a condition and as a value.
  • Conditions are not version handled. If "B" is a new version of "A", then say so.
My solution? Use an abstraction layer. Define a semantic profile, a basic meaning for the condition, and version handle that profile, updating it when there is a change to the condition. The change could be a simple name change for the corresponding product but it could just as well be a change to the product's functionality. Doesn't really matter. A significant change will always requires a new version. Then, represent that semantic profile with a value used when publishing.

Since I like URNs, I think URNs are a terrific way to go. It's easy to define a suitable URN schema that includes versioning and use the URN string as the condition when filtering, but the URN's corresponding value as expanded content. In the paper, I suggest some simple ways to do this, including an out-of-line profiling mechanism that is pretty much what the XLink spec included years ago.

Using abstraction layers in profiling is hardly a new approach, then, but it's not being used, not to my knowledge, and I think it should. I fully intend to.

Evolution 3.2

Evolution 3.2 solved my Groupwise problems by eliminating Groupwise support altogether. It's an odd way to do it, considering that both originate from the same company, Novell. I am now left without a groupware solution for Linux.

In all fairness, mine is the unstable ("Sid") branch of Debian Linux, which means that the Groupwise library will likely be updated and re-included at some point. It's just that the functionality used to be one of the core advantages of Evolution and what brought me to it in the first place.

Every time I start to think that Linux is finally ready for the desktop, something happens.

Friday 16 December 2011

I Spoke Too Soon

Turns out that Evolution can misbehave in Gnome 3.x, too. It just takes a little longer. Had a look at my calendar, just now, and noticed that the stupid thing had crashed.

Damn.

XML Prague 2012

There's going to be an XML Prague in 2012, and I'm going to be there, again. Already looking forward to it. Not enough XML geekery for me lately.

Evolution/KDE/Gnome Rant

I've been running Evolution as my email/calendar/groupware/etc solution in Debian and KDE 4.6 at work ever since I gave up on Windows for anything beyond PowerPoint presentations and such. In spite of the Novell Groupwise server misery that we are forced to live with at Condesign, Evolution does the job. I've actually managed to synch my mail and appointments with both my trusty N900 and an Android thingy that the company wants to be my primary work phone, and have been if not pleased then at least content with the situation.

I should add that using a KDE solution (KMail/Kontact) has never worked for me. I can't get Kontact to log in to the Groupwise server, no matter what.

Anyway, unfortunately a recent apt-get update did... something. I'm still able to read my email in Evolution but the calendar and address book both crash with a DBus error whenever I try to view or use them. The usual suspects, from deleting caches to looking for non-UTF-8 characters in calendar ICS files, do not seem to apply and upgrading or downgrading Evolution doesn't help either. The problem seems to be more fundamental.

Yesterday, however, I booted into Gnome rather than KDE, mostly because I was bored and wanted to see what Gnome 3.x is like. Thing is, for some inexplicable reason Evolution now runs without a hitch. Calendars, address lists, everything. No crashes, no DBus errors.

Now, I've used KDE for years, preferring it over Gnome because the latter always feels a bit patronising to me. Gnome is like a Linux equivalent to OSX, built on the assumption that users are all idiots and the inner workings-on of a computer should always be kept hidden so the user is not unnecessarily confused with anything even remotely technical.

Yet, OSX, for the most part, does the job. It just works, which I discovered recently when setting up a MacBook Pro for my daughter. It had no problem finding and configuring our home network HD and printer (tricky subjects for our Windows and Linux boxes, for some reason), and even displayed a nice image of the exact printer model to help me install it. Pretty cool, actually.

And this is what Gnome 3.x seems to focus on also, on just working. Yes, it feels a bit dumbed down, but it really seems to just work. I even think that I could learn to live with the 3.x GUI.

And I got my calendar back.

Tuesday 15 November 2011

Semantic Documents

I'm back from XML Finland, where I held a presentation on how to use the concept of semantic documents in content management systems. Not everyone was convinced, but I wasn't thrown out, either.

A semantic document is the core information carrier, before a language or other means of presentation to an audience, is added. It's an abstraction; obviously, there can be no such thing in the real world but as a concept, the semantic document is useful.


For example, a translation of a document can using the concept be defined as a rendition of the original, just as a JPG image can be rendered in, say, PNG without the contents of the image changing. It is very strictly a matter of definition--the rendition is not necessarily identical in all details of content to the original, it's simply defined to be a matching rendition for a target audience.

Of course, for a semantic document and its rendition in a given language to be meaningful in a CMS, none of those varying details can be significant to the semantics of the basic information carrier, only to make a necessary clarification of the core information to the target audience. In other words, a translation may differ from the original for, say, cultural reasons (if the original language's details in question are bound to the original language and readership), but the basic meaning cannot be allowed to change.

To the concept I also added version handling, that is, a formal description of the evolution of the contents of the basic information over time. When a new version is required is, of course, also a matter of definition; I'd go with "a significant and (in some way) completed change". What's important is that a two matching or equivalent renditions of the semantic document must always use matching versions.

Expressed using a pseudo-URN schema, if the core semantic document in some well-defined version (say "1") is defined as URN:1, the Swedish and Finnish versions would be defined as URN:1:sv and URN:1:fi, respectively. They would be defined to be different renditions of each other but identical in basic information. It follows that if a URN:2:sv was made, a new Finnish translation would have to be created, because the old translation would differ in some way, according to the definition

This, of course, is largely a philosophical question. In practice, all kinds of questions arise. I had several objections from the floor, of which most seemed to have to do with the evolution of the translation independently from the original. In my basic definition, of course, this is not a problem since the whole schema is a matter of definition, but in the real world, an independent evolution of a translation is often a very real problem.

It could well be that a translation is worked on rather than the original, for example, in a multi-national environment where different teams manage different parts of the content. While theoretically perfectly manageable simply by bumping the versions of that particular translation, a system keeping track of, say, 40+ active target languages becomes a practical problem.

I don't think the problem is unsolvable if there is a system in place to keep track of all those different URNs, but only if the basic principles are strictly adhered to. For example, you can never be allowed to develop the content in different languages independently from each other at the same time, because the situation that would arise would have to deal with what in the software development world is known as "forking", that is, developing differing content from the same basic version. While also solvable, the benefits of such an approach in documentation are doubtful.

Far easier and probably better is to define a "master language" as the only language allowed to drive content change. In the above pseudo-URNs, Swedish could be defined as a master language, meaning that any new content would have to be added to it first and then translated to the other languages.

This is the basic principle behind the CMS, Cassis, that we develop at Condesign. It works, in that the information remains consistent and traceable, regardless of language, and allows for freely modularising documents for maximum reuse.

I would be interested in hearing opposing views. Some I addressed during my talk in Finland, but I'm sure there is more. Is there a reason you can think of that would break the principle of the semantic document?

Monday 10 October 2011

XML Finland, Not On A Boat After All

XML Finland will not be held on board a boat, after all. The Radisson Blue Seaside hotel in Helsinki is the new venue and the seminar limited to one day, November 10.The organisers say logistics are to blame. I have to say I'm disappointed. An XML boat would have been fun. Also, I'll miss the evening snacks and sauna, as I'll have to catch a plane in the evening.

I wonder if XML Prague can be persuaded to relocate to a boat instead.

Thursday 22 September 2011

XML Finland, Pt 2

I'll be presenting at XML Finland on November 10. Looking forward to it already.

Google Plus

Yesterday, I started my browser and found that Google had added You+ to the far left on www.google.com. Being the geek I am, naturally I joined this initiative. Google Plus immediately reorganised my Google settings and all of a sudden, my Blogger page is nowhere to be found. I had to go back to the previous Settings version to find it.

I'm all for change, but I don't like this type of change.

Monday 12 September 2011

Me and XML in Stockholm... Again

I'll be talking about XML, modularised documentation and such in Stockholm on December 7. The event is a one-day course organised by Dokumentinfo (link in Swedish).

XML Prague 2012

Speaking of XML conferences, XML Prague 2012 has been announced and will take place a month earlier than the last few times, on February 10-12. The venue is also new, a good thing since the last two events were sold out.

Looking forward to this one already.

Friday 9 September 2011

XML Finland

XML Finland 2011 will be held on board a cruise ship on November 9-10. Looks like fun, so I've submitted a presentation. Wish me luck.

Sunday 12 June 2011

Roland D-50 Key Servicing

I serviced my old Roland D-50 today after noticing problems with the aftertouch of two keys. It seems the key contacts need yearly mending to work--I took the whole thing apart about a year ago, cleaning all 61 keys and their contacts.

Here's a useful web page for those of you who have a D-50 in need of servicing.

Thursday 9 June 2011

Flight Sims

There is a terrific open source flight simulator called FlighGear. It's freely available for my platform of choice, Debian Linux (and a number of others, including Windows and Mac OS X) and it's quite mature these days, so naturally it's what I run when I want to fly a plane these days. When I still had a Windows partition that worked, I have to admit I quite liked Microsoft's classic Flight Simulator, but my Vista partition doesn't work all that well and anyway, Microsoft killed off the sim a year or two ago. FlightGear is a more than adequate replacement.

Today I learned that somebody is marketing an older FlightGear version under a different name (Pro Flight Simulator), charging around $50 for a DVD or download and promising free lifetime updates. Of course, there is no (easily found) mention of FlightGear anywhere on their site, and I doubt the source code is easily available, either.

It has to be somewhere, though. See, FlightGear is GPL software, which basically means that you can do whatever you want with the software (including selling copies of it) for as long as you also make available the source code. I think GPL lists a few other conditions as well, but the idea is that software should be free (as in speech).

So what these people do when ripping off free software is most likely not illegal, merely unethical. To further firmly establish themselves in the gutter, they have produced a number of blogs and fake reviews to market the product, seemingly without any shame; do a Google search if you are interested, but I won't help their cause by giving you a direct link.

Read all about the scam at http://www.flightgear.org/flightprosim.html, and download a FREE copy of the latest version if you are interested in flight sims. Or just spread the word.

Saturday 4 June 2011

Me and XML in Stockholm

I'll be talking about XML in Stockholm on June 16th. The event is a one-day tutorial for technical writers, managers and other interested parties, organised by Dokumentinfo. They organise tutorials on various subjects related to document management and archiving, and a yearly conference where I was invited to speak last year.

So far I have few details but I'm pretty sure I'll manage to include XLink, somehow.

Finally, KDE 4.6 on Debian

Again, title says it all. I'm only a few days into running KDE 4.6 on my desktop but so far it's superior to any previous 4.x. It feels like, well, it just works. It's also beautiful; Plasma is finally mature enough to do all those things I read about two years ago.

What doesn't work all that well is Amarok. It still won't play CDs (it can now list the CD contents - hooray), and while I do understand that some of these things take time, 1.4 didn't have any problems in that respect. I still haven't found an alternative for my every music need but out of spite I'm now running Clementine, an Amarok fork that also doesn't grasp CDs.

Monday 4 April 2011

Sponsored Links Rock

Found this while reading about fake pilots on AOL Travel. Don't you just love sponsored links?

An Even-Simpler Markup Language?

in his blog, Norman Walsh writes about an even-simpler-than-Mixro-XML markup language, inspired in part by John Cowan's XML Prague poster and by James Clark's Micro XML ideas. His ideas are well worth a serious consideration--Norm's ideas are always worth considering--but the purist in me cringes at the idea of allowing more than one root element. I have to say that I find the idea attractive but I'm not really big on change so maybe that is why I hesitate.

The pragmatist in me, on the other hand, also cringes at Norm's not doing away with namespaces when he has the chance. in my experience they always create more problems than they solve, but on the other hand, my experience tends to be more about strictly controlled environments where the issues one usually wishes to solve using namespaces can be dealt with using other means.

Friday 1 April 2011

Until Next Year, XML Prague

This year's XML Prague is over and I miss it already. For a markup geek, XML Prague is heaven. There is always so much to learn, so many great minds and cool new ideas, not to mention Czech beer and the friendly atmosphere of a smaller conference. This was my third consecutive year attending and I very much look forward to the fourth.

Some notes of interest:
  • XML Prague is a great success. The conference sold out before the sessions were announced so next year, it will move to a larger venue.
  • HTML5, last year's hot topic, was pronounced dead more than once.
  • Michael Kay announced (and demo'd) Saxon Client Edition that allows you to run XSLT 2 on the browser. Very cool. Saxon CE is in alpha but available for testing at www.saxonica.com.
  • JSON seems to be hot this year. I should probably spend some time learning it, especially since I am planning to use it in the CMS we develop at Condesign.
  • George Bina from SyncRO Soft Ltd, the company that makes Oxygen, presented some ideas regarding advanced XML development. While Oxygen is at the centre of many of these, his point was that there should be a standardised way to do it all. Dave Pawson suggested expanding XML catalog files for the job via Twitter, an idea I find plausible.
  • Murata Makoto, a personal hero of mine thanks to his work with Relax NG, presented EPUB3. What those of us who were there will remember, however, is his introduction, expressing his grief over the on-going catastrophe in Japan.
See www.xmlprague.cz for more.

Monday 14 February 2011

Squeeze Is Out

The new Debian, Squeeze, is out. This means that we can finally expect new things added to the unstable version, Sid, such as KDE 4.6.

Thursday 27 January 2011

XML Prague 2011, Part Two

This year, my paper wasn't accepted for XML Prague. I guess I'll just have to go there anyway.