Friday, December 12, 2014

SOA, Achilles and the tortoise

Sometimes I have the impression that SOA is the tortoise that Achilles can never reach (as in Zeno's paradox).

For each process that can finally be automated and orchestrated with the Enterprise Service Bus there's some other that requires a native, peer-to-peer connector; for each one that is built to use the centralised Identity and Access Management service, another one pops up where some proprietary user management system must be used; for each workflow that is perfectly implemented with the enterprise BPM suite someone must come up with an urgent need of scheduling a sequence of batch jobs where somehow it is assumed that they will run flawlessly; for each pure SOAP or REST interface some ugly file transfer or share drive must come up; and for each shared service that is set up, someone will claim that their application is too critical, needs too much flexibility or time-constrained to use it.

Architecture is indeed an exercise in frustration, sometimes preaching in the desert, sometimes fighting windmills and other times being seen as a project risk.
More often than not, after all the negotiations and compromises that inevitably take place one of 2 things happen:
- if the architect conceded too much and allowed exceptions then something will go wrong. Usually the architect will be in a position to say "I told you so" but that won't help at all then;
- if instead, the architect sticks to his guns defending the best way to implement a system even against strong political forces, then over time everyone will start realising the advantages and using them to their benefit which is the best reward the architect can get. He will take no credit for it though - someone else will instead advertise how smart he was for reusing some component that the architect had spent years to get agreement for deploying as a shared service.

Ultimately being an architect is something that one must do with passion and find the reward in his own work. After all it is a great job. Also, after a few years a certain reputation does build up - for the best or for the worst.

But no would-be architect should ever think that it's all glamour and respect.

Sunday, April 27, 2014

Double kernel in Windows - Why not?

As a Windows user I am, as much as everyone else, regularly annoyed with the Windows Update process. Booting the computer to look at a screen saying "Configuring Windows Updates - don't turn off your computer" is just not fun. 

So why on earth doesn't Microsoft do something about it?

Here's a thought: why not use a double kernel? You know, like double-buffering for rendering video.
The PC would have 2 copies of the OS. While the user was working on the active image the other one would be updated in the background. Then the user would be automatically switched over to the new version. The old version would be synchronised with the new one to get ready for the next update, and so on.

It's actually quite simple. 

Another way is to embed Hyper-V in the normal Windows distribution and turn every PC into a virtual cluster. This might even be simpler.

Either way, anything would beat the darn update process.

Friday, March 28, 2014

A TOGAF 10 wish list

I do admire the TOGAF framework as a way of organising work.

It's logical and complete and it can be flexible enough to adapt to different dimensions of scope so it is, in my opinion providing the most value in supporting programme management. To be honest I don't think that it is realistic to use any framework (be it TOGAF or any other one) in a full enterprise-wide scope. But for programmes, why not?

There are however some things in TOGAF 9 that have the retro look of a science fiction movie from the early 90's. In most cases they are optional and I think that they should not even be part of the framework. TOGAF should describe, not prescribe.

Here are some ideas that in my opinion would, so to speak, bring TOGAF 10 to the 21st century:

1. Get rid of the outdated (almost funny except that they aren't) reference models.
I mean, what is the meaning of the Technical Reference Model, shown below? I mean, really... "Graphics & Image"? "Operating System Services"? Maybe (and I'm not even sure about that) this had some meaning back in the far past....



(Source: http://pubs.opengroup.org/architecture/togaf9-doc/arch/chap43.html)


And then there's the III-RM. Again...




(Source: http://pubs.opengroup.org/architecture/togaf9-doc/arch/chap44.html)

It's not even worth starting a discussion is it?

Rather, instead of these simplistic and rather useless reference models The Open Group should create a community that would itself create and update reference models for all sorts of different purposes. These could be, e.g. reference models for business intelligence, document management, master data management, high performance computing, big data, advanced analytics, cloud integration, you name it. From the reference models architects could then create their reference architectures, suited to the needs of their organisations.



2. Replace Architecture Building Blocks" (ABBs) for Application Services. ABBs just don't make sense. Solution Building Blocks (SBBs) are real components, while ABBs are supposed to be logical components. But what does that actually mean? Ultimately the customer needs a service, not a logical component.
The service can be realised by an SBB or it can also be outsourced for example. And that's all there should be to it.


3. Introduce Archimate as the "lingua franca" in TOGAF at all of the logical layers. It is now owned by The Open Group anyway... and it's really good, too.


4. Require methods for designing and evaluating architectures. Here I am thinking in terms of ISO2000 and ITIL. ISO2000 requires processes and ITIL describes them. In this case I would recommend the SEI methods ADD (Attribute-Driven Design) and ATAM (Architecture Tradeoff Analysis Method) at least.

Monday, September 2, 2013

Data Vault in LoB applications - why not?

The data vault modelling technique has been created specifically for the data warehouse. It is a rather simple design pattern that is intended to address the shortcomings of 3NF and dimensional modelling at the stable area of the DWH. Here's DV concept in a nutshell:

- There are 3 types of tables: hubs, satellites and links;
- Hubs contain the business keys (e.g. Customer ID) and, for each record, a surrogate key;
- Links are tables that represent relations (in practice turning every relation into the many-to-many variant);
- Satellites have the actual entity data. There can be as many or as few satellites as you wish. Satellites can be attached both to hubs and links (but not to each other). 

In addition to that each table (hub, satellite and link) has a "load date/time stamp" (Load_DTS) and a "record source" column. This way, tracking of data loading is ensured. This pattern easily allows for historic data to be persisted and back-tracked. If an entity's schema changes (e.g. from a certain point in time a company wants to have additional customer information than it had before thereby creating new data fields for "customer"), then a new satellite can be created. For convenience satellite tables have an "end date/time stamp" (End_DTS) column, so that a query can easily find the latest record for an entity. Furthermore, the primary key in a satellite is a combination of the foreign key from the hub (that's the surrogate key, not the business key) and the Load_DTS. Here's an example, directly from Dan Linstedt of how DV implements the Northwind database sample that comes with Microsoft products all the time (typical e-commerce solution).



It's clear that if a change to a customer occurs (e.g. new street address) they only thing that needs to be done is an insert operation to the satellite table - old data is always kept. If a new field is added to a customer then a new satellite table can be created. This is one of the main arguments for DV: "all the data, all the time". But looking at the example again there is something else that strikes me: what if the company now wants to have specialized data on their customers? For example, they want to distinguish between corporate and private customers. Obviously these have some fields in common and other that are only meaningful for the type of customer that they are. This is the "is a" relation that is always a problem to map to the relational world, especially if one wants to keep historic data, i.e. to see at the data as it was before the schema change. This is business-as-usual for object-oriented but it's always a pain when doing the persistence layer. First, one needs to agree on how the relational model looks like, as there is no specialization relation there. Of course one can add a table for each specialized class and keep the existing Customer table (adding a field for the customer type), flatten the whole thing by adding the extra columns to the existing customer table, or keep only the specialized tables (CorporateCustomer and PrivateCustomer) and dump the original customer table. In any case it's a pain to change the model, to migrate the old data to the new model and to rewrite the persistence layer (so-called O/RM, or object-to-relational mapping). 

As already said, if one wants to keep historic data then it's even more complicated and maintaining the consistency and mapping with the object-oriented world gets more and more complex as new changes come along (e.g. the model can later be further specialized for handling VIP customers).

Enter data vault: all of these shortcomings are suddenly addressed with very small effort and what's more, in a standardised way. In this case, just create 2 new satellite tables for private and corporate customers; that's it. On the application (OO model) extend the customer class to have the 2 subclasses and simply provide the serialisation/de-serialisation methods that write and read to the respective satellite. Job done. 

So why not use data vault for line-of-business (or so-called transactional) applications? A deeper look into how transactions and locking are supported is needed and I intend to try it out sometime. I certainly think it's worth the time.

Sunday, April 21, 2013

The silent revolution in data architecture

While the data vault (http://danlinstedt.com/) is undoubtedly the best modelling technique for keeping historic data in a classic Enterprise Data Warehouse (no wonder, since it was invented for it in first place), it is not obvious how or if it will survive in the brave new world of in-memory analytics.

Now that wide-spread adoption is well under way, the data warehouse itself is changing. Major vendors are offering everything in-memory with column-based storage and compression as keys for ultra efficient data retrieval. Ironically one of the main arguments for the data vault is faster data loading (i.e. writing), not retrieval (i.e. reading). The reason for that is that by breaking up the dependencies on keys by using surrogate keys instead, table contents can be written all in parallel regardless of referencial integrity issues.
But with the innovation seen in the data warehouse area in current times, data vault may well be a technique that is dead before it had the chance to prove its merits - bad luck, so is life in IT.

In summary, the cube/snowflake/star schema is (any time soon:) dead, killed by column-based, in-memory analytical data stores; the relational database itself is more and more shaky (now with SAP HANA as the basis of the ERP making a big step in that direction, at least if it succeeds), and even disk storage is seemingly on its way down. As someone said "disk is the new tape". Why do you need persistent storage if you have huge amounts of RAM in fault-tolerant data centres? For backups. Then again, soon enough all backups will end up encrypted in the cloud right?

Now, what about end user devices? Again, the hard disk is all but dead. Flash, SSD, clould are already the standard.

I find it really surprising that there's not more media hype around all these threats to the HD. After a couple of decades of quite suspicious super-fast increase of storage capacity in HD technology (is innovation really that fast that disks can double capacity every year, or was it just about market manipulation?) it seems that the tide is turning. 

That's IT for you: without the physical constraints of hard disk storage, the database (whether relational or analytical) is not so optimal anymore; without the constraints of network speed, local storage is not that relevant anymore; without the constraints of local capacity, distributed systems are hardly necessary (Google Public DNS anyone?); etc. But then again we still have mainframes around right?

Tuesday, February 19, 2013

Waiting for TOGAF 10


TOGAF 9 is a great framework. It has a robust "chassis" on which one can really build a solid Enterprise Architecture function. The only problem with it is that some parts of it are outdated. One can really see that all the parts work together quite well and it's easy to see why it became so hot some time ago. It's like an old Rolls Royce; you admire its quality and enginnering but it is no longer the most practical means of transportation for everyday life. 

Take for example, the Architecture Building Blocks (ABBs): according to TOGAF's definition ABBs "capture architecture requirements; e.g. business, data, application and technology requirements" and they "direct and guide the development of SBBs" (note: SBBs are Solution Building Blocks). ABBs are therefore "logical" components. Moreover, also according to TOGAF's defintion, a building block (whether ABB or SBB) "has a defined boundary and is generally reconizable as "a thing" by domain experts".

The intention is understandable. By abstracting the "thing" into a set of functions one should in principle find ways to easily repliace it by another equivalent "thing". The thing can be a web server, a database, an application server, a CRM system and so on.
The problem is that this kind of bottom-up abstraction logic doesn't last very long, since it is driven by implementation. It's like saying that in order to move items from point A to point B one needs a truck. "Truck" would then be the ABB and the actual choice of a brand and model would be the SBB. But who says that one needs a truck? What if the business finds that hiring a transportation company is cheaper than buying a truck? 

Indeed it is much more important to document that you need a transportation service with certain service requirements; this service should show up in every other area of the enterprise where it is needed. Then the decision of, say, reusing a truck that is already used by some other department in the enterprise or of hiring a transportation company instead can be supported by real business arguments. Maybe the other department would benefit from a general contract with the transportation service and get rid of its truck. Or they would just agree to share the truck. Either way what is important is first looking at the application services that are needed to support a certain business process or function, and then at possible solutions to provide these services. ABBs are an artificial construction for which I don't see any value added at all.

Furthermore by extending the service concept downwards to the infrastructure level a quite useful system map can be easily obtained. In fact the application components will use infrastructure services that are in turn provided by infrastructure components. By documenting all that you end up with a map of the business processes that require each infrastructure component via application services. This is a powerful tool to optimise the whole IT landscape.

Coming back to TOGAF 9, there is somethting else that belongs to the 1990s museum: the reference models - the Technical Reference Model or TRM and the Information Information Infrastructure Refernce Model or III-RM. Here's a quick summary of these 2 animals:
  • the TRM says that applications are built on top of application platforms which use different services requiring different qualities. These are provided by an Operating System which uses network services which is built on a communciations infrastructure;
  • the III-RM says that you should use a broker between consuming and provider applications.


These are quite generalistic and over-simplistic views. They are true for most (but not all) IT applications out there but what value does this actually provide? The way I see it each Organisation must produce its own set of reference models, reference architectures and patterns based on its needs and constraints. What TOGAF should be focusing on is a method to help develop these artifacts. Building them is an incremental exercise that is essentially capability-driven. For example when a new capability is identified as beneficial for the enterprise it is usually best to make a shared service out of it. Then you need a reference model for it and from the model (by mapping it to the existing landscape) you can derive a reference architecture and finally a set of patterns to guide consumers using the service.

In summary, TOGAF is a great framework as it really helps to manage the many aspects to the enterprise architecture but I think that a new model of this Rolls Royce is long overdue. The fact that The Open Group took on board Archimate as a common language looks promising, since Archimate provides an excellent way of representing a top-down and service-oriented view of the enterprise;  I can't wait to see the next new major version of TOGAF. 

Friday, January 18, 2013

XML, the extra morose language


First there was HTML; it was a "quick and dirty way of producing information and making it available. The difficulties in making web pages look good and the overhead that it caused on the network were outweighed by the simplicity and time required to produce information in the form of standardised documents. With all its shortcomings, coupled with HTTP (also quite limited at the time) it was enough to change the world.

Those limitations were gradually removed with new versions of HTML together with different scripting technologies both for the client and server sides. Later, compression became possible in order to reduce network latency.

Meanwhile XML came along and it too took over the world. Again many optimisations emerged, and also new ways of encoding binary data.

These are typical examples of IT solving problems that were created by IT. Looking at it now, what sense does it make to encode data in a human readable format when it is meant to be transferred only between computer programs? Why do applications need to parse tags? Why do messaging components need to verify if end tags match start tags? Why do applications need to convert numbers to strings and then back on the other side? Let's face it: markup languages are nonsense. There's no real advantage in using XML over ASN.1. It's slower, less scalable, overall far less efficient. And yet it has become "the" standard. But has it?

With the introduction of Ajax, JSON (another nonsense standard) started to emerge because it's much more efficient for representing portable code, more specifically Javascript. But meanwhile Google decided that GMail should be fast so they just made up their own binary standard and made it open source (Protocol Buffers). BTW, Google also produced a programming language (Go) with... pointers!

When I think of the petabytes of data that are transferred every day unnecessarily (just think of Base64), and the processing power that is wasted just to parse XML I get dizzy.

This is however how the Internet works nowadays. And it's not only the Internet. SIP, the signalling protocol used in voice communications that should be taking over the world anytime soon (isn't it?) is often discarded because good old H.323 still proves much more efficient.

Many different companies start producing their own proprietary protocols, usually very optimised. Their motivation is not to share but to get market share. Finally when standards start to become important, i.e. when customers demand standards because they don't want to be locked in to specific vendors, these companies finally agree on a standard that is worse than their own solutions, probably hoping that those customers will eventually give in and use the proprietary versions. Finally the world surrenders to the least common denominator and then spends a lot of time and effort solving the problems that it creates.

In 2011 the W3C adopted EXI as a standard for "efficient XML interchange". It provides over 100x perfomance increases over XML (which is not hard to do). Let's hope it finally takes off and gets real adoption from vendors...