Quantcast
Viewing all articles
Browse latest Browse all 8

Problems with Event Driven Architecture

Some of the architects here at DSI and myself have been investigating the reconstruction of a major system into an Event Driven Architecture (EDA). The pattern is extremely well distributed and as loosely coupled as it is possible to attain. For a very large system this is very attractive; I won’t bore you with the benefits of a loosely coupled system.

The most notable attribute of the system we are proposing is that, each event processing component locally persists the data necessary to perform its own operations. For example an order processing component would locally persist a copy of all the relevant customer data. The data integrity is maintained by listening to events describing changes to customer data. Thus, while the order processing system is not responsible for customer data, it maintains a replicated copy of the customer data required for order processing. Events relating to changing customer data, adding new customer or changing addresses say, are generated and consumed by the component responsible for customer maintenance, it’s just that the order processing system also becomes a consumer of these events. Some of you will have noticed this is very similar to the functions of a traditional data replication process, indeed this pattern may supersede traditional data replication functions entirely.

This pattern achieves the massive levels of loose coupling we seek without obvious complexity. In fact, it’s perfectly possible, with the wise addition of persistent queuing for message handling, to easily shutdown entire components for significant periods of time without ever inconveniencing any of the systems other functions. Extremely loose coupling is the Holy Grail for cheap and efficient scalability. Modern production systems and hardware are amenable to this architecture and it’s also very cost effective even with the potential for data redundancy.

We have, however, identified several potentially difficult issues. For example, since each system has what amounts to replicated data, the problem of synchronization is of critical concern. Backups and good data management practices are actually very simple within the model because, each component can easily be isolated while undergoing maintenance, without loss of overall system function. The problem occurs if one or more components encounters a catastrophic failure at some point after the last backup. At this time the system would have both received and transmitted events not yet captured within the last data backup. Upon restarting the component from the last good backup, the component’s state would be out-of-sync with the other components of the system. Since each component might be backed up on a completely different schedule, it is not possible to synchronise the system’s components by restoring them from their last backups.

Returning to our order processing system example, the customer maintenance component may have made changes to key customer data the order processing component is unaware of, if the component were to fail for some reason. If orders were to be received after the restart, they might go to the wrong address for example, or indeed orders for new customers may not process at all. This is a significant risk which prior experience tells us will eventually occur at some point.

Another obvious issue with Event Driven Architecture relates to Ad-hoc data corrections. With a monolithic, centralised data repository it’s relatively trivial to make universal ad-hoc changes to your system’s data. With a distributed data-model, ad-hoc data modifications are a much more complex proposition, even with clearly defined sources of record for each data element. While ad-hoc data changes should be considered evil and avoided with vigour, they are often a necessary and commonplace aspect of businesses. Indeed, an enterprise may often rely on ad-hoc modifications for consistent day-to-day operation of its business. With replicated data, dependent on events triggering downstream behaviour, making ad-hoc adjustments can become a very complex undertaking.

I intend, in several succeeding blog entries, to discuss how these issues, and others like them, can be remediated within the context of developing an Event Driven Architecture.


Image may be NSFW.
Clik here to view.
Image may be NSFW.
Clik here to view.

Viewing all articles
Browse latest Browse all 8

Trending Articles