Friday, January 18, 2013

XML, the extra morose language


First there was HTML; it was a "quick and dirty way of producing information and making it available. The difficulties in making web pages look good and the overhead that it caused on the network were outweighed by the simplicity and time required to produce information in the form of standardised documents. With all its shortcomings, coupled with HTTP (also quite limited at the time) it was enough to change the world.

Those limitations were gradually removed with new versions of HTML together with different scripting technologies both for the client and server sides. Later, compression became possible in order to reduce network latency.

Meanwhile XML came along and it too took over the world. Again many optimisations emerged, and also new ways of encoding binary data.

These are typical examples of IT solving problems that were created by IT. Looking at it now, what sense does it make to encode data in a human readable format when it is meant to be transferred only between computer programs? Why do applications need to parse tags? Why do messaging components need to verify if end tags match start tags? Why do applications need to convert numbers to strings and then back on the other side? Let's face it: markup languages are nonsense. There's no real advantage in using XML over ASN.1. It's slower, less scalable, overall far less efficient. And yet it has become "the" standard. But has it?

With the introduction of Ajax, JSON (another nonsense standard) started to emerge because it's much more efficient for representing portable code, more specifically Javascript. But meanwhile Google decided that GMail should be fast so they just made up their own binary standard and made it open source (Protocol Buffers). BTW, Google also produced a programming language (Go) with... pointers!

When I think of the petabytes of data that are transferred every day unnecessarily (just think of Base64), and the processing power that is wasted just to parse XML I get dizzy.

This is however how the Internet works nowadays. And it's not only the Internet. SIP, the signalling protocol used in voice communications that should be taking over the world anytime soon (isn't it?) is often discarded because good old H.323 still proves much more efficient.

Many different companies start producing their own proprietary protocols, usually very optimised. Their motivation is not to share but to get market share. Finally when standards start to become important, i.e. when customers demand standards because they don't want to be locked in to specific vendors, these companies finally agree on a standard that is worse than their own solutions, probably hoping that those customers will eventually give in and use the proprietary versions. Finally the world surrenders to the least common denominator and then spends a lot of time and effort solving the problems that it creates.

In 2011 the W3C adopted EXI as a standard for "efficient XML interchange". It provides over 100x perfomance increases over XML (which is not hard to do). Let's hope it finally takes off and gets real adoption from vendors...

No comments:

Post a Comment

Comments are always welcome. They will be moderated for posts older than 14 days. In that case a delay of a few hours can be expected before publishing.