Friday, January 18, 2013

XML, the extra morose language


First there was HTML; it was a "quick and dirty way of producing information and making it available. The difficulties in making web pages look good and the overhead that it caused on the network were outweighed by the simplicity and time required to produce information in the form of standardised documents. With all its shortcomings, coupled with HTTP (also quite limited at the time) it was enough to change the world.

Those limitations were gradually removed with new versions of HTML together with different scripting technologies both for the client and server sides. Later, compression became possible in order to reduce network latency.

Meanwhile XML came along and it too took over the world. Again many optimisations emerged, and also new ways of encoding binary data.

These are typical examples of IT solving problems that were created by IT. Looking at it now, what sense does it make to encode data in a human readable format when it is meant to be transferred only between computer programs? Why do applications need to parse tags? Why do messaging components need to verify if end tags match start tags? Why do applications need to convert numbers to strings and then back on the other side? Let's face it: markup languages are nonsense. There's no real advantage in using XML over ASN.1. It's slower, less scalable, overall far less efficient. And yet it has become "the" standard. But has it?

With the introduction of Ajax, JSON (another nonsense standard) started to emerge because it's much more efficient for representing portable code, more specifically Javascript. But meanwhile Google decided that GMail should be fast so they just made up their own binary standard and made it open source (Protocol Buffers). BTW, Google also produced a programming language (Go) with... pointers!

When I think of the petabytes of data that are transferred every day unnecessarily (just think of Base64), and the processing power that is wasted just to parse XML I get dizzy.

This is however how the Internet works nowadays. And it's not only the Internet. SIP, the signalling protocol used in voice communications that should be taking over the world anytime soon (isn't it?) is often discarded because good old H.323 still proves much more efficient.

Many different companies start producing their own proprietary protocols, usually very optimised. Their motivation is not to share but to get market share. Finally when standards start to become important, i.e. when customers demand standards because they don't want to be locked in to specific vendors, these companies finally agree on a standard that is worse than their own solutions, probably hoping that those customers will eventually give in and use the proprietary versions. Finally the world surrenders to the least common denominator and then spends a lot of time and effort solving the problems that it creates.

In 2011 the W3C adopted EXI as a standard for "efficient XML interchange". It provides over 100x perfomance increases over XML (which is not hard to do). Let's hope it finally takes off and gets real adoption from vendors...

Saturday, January 5, 2013

The uncertainty of forecasting clouds


As I wrote in a previous post the adoption of large-scale cloud computing services suffers from many ills, for which I can only envisage one medicine: regulation.

It is tempting to think that cloud is just a matter of time; that companies giving up their data centres and most of their IT staff is something about to happen; that so called "mega-vendors" will dominate the market offering all the application, platform and infrastructure services that any non-IT company might ever desire. I, for one am a believer. I think that most companies will have extremely reduced information deparments compared to nowadays. They will not have IT departments anymore but rather a few roles such as Information Architects, Security Experts, Data Scientists (high in the hype cycle now), possibly a handful of developers, depending on the type of business. Big corporations and institutions may keep a  "private cloud" either for cost reasons, to keep highly confidential data or both. Cost is indeed an issue: nobody proved yet that it is cheaper to rent a large data centre than to own one. In fact it may never be. That is the case with office buildings. It is far cheaper for a corporation to own than to rent one. On the other end of the spectrum, small to mid-sized companies may just outsource everything to cloud providers and not even have an IT department anymore. I am sure that there are cases like that already.

But for most companies cloud is still a challenge and will remain one until there is enough regulation and enforcement thereof. Tecnhically moving to the cloud is not difficult. Services have been accessible for a few years already. But there are a few reasons why businesses may not be prepared yet:

1- The cloud provider may go down, possibly even because of some other cloud provider with whom the prime contractor had an underpinning contract. Fresh example: Netflix going down possibly because of an outage in Amazon.

2- Data leakage: data access is almost impossible to protect from at least a few system administrators. The only effective means of protecting data is through encryption. But encryption is a tricky control, extremely hard to manage. First there is ultimately a key that is a single point of failure - encrypting is pointless if the system administrators or anybody eles has access to the key; second, if the key is lost then the whole data is lost; third, keys should be changed regularly but then they need to be backed up somewhere safe - of course not with the data itself. And finally depending on what is encrypted different qualities may be affected. For example using database encryption usually has an impact on performance, may restrict available features and requires licensing of additional modules. Using file system encryption has more or less the same issues, and using volume encryption is useless while a system is working and the administrator has access to it. Obviously encrypting data that needs to be accessed by multiple persons is complex as many keys need to be managed, and if applications need to access the data then the keys need to be saved somewhere in the system which make them accessible to system aministrators (and hackers).

3- Undefined boundaries: Who else is using the infrastructure? What if your system is running on a virtual machine and one of your "neighours" who is hosted in another VM in the same physical server breaks out and gets control of that server? Then your machine is already compromised.

4- Legal/compliance: who knows where your systems are running? There my be constraints on location - e.g. your data may not legally leave the country of origin. Moreover, if there's a police investigation on one of your "neighbours" (i.e. someone who is using the same infrastructure as you) and the Police needs access to your files to follow a trace, you will need to deal with that too.

The main problem with solving these issues may be lack of motivation from those "mega-vendors". Lack of regulation benefits the established large vendors. First, they are happy to host systems from anyone on the planet within their own legal framework and on their own terms. Second, without regulation only the big vendors have enough credibility from the customers' point of view. And then there are some the technology issues: regulation leads to standardisation, accreditations and consequently to more competitors in the market and reduced profit margins. Besides, there is no doubt that in the cloud world open source is king. It is obvious that a provider will always prefer to use and develop open source to paying licenses to someone else - after all they are not in the business of selling code, but services; so the large vendors that live on selling licenses are not really interested in standardisation but rather on keeping a competitive advantage by offering more and better integrated services than the competitors.

Anyway it is certainly interesting to live in such a transition era.