Monday, August 03, 2009

SOA Performance - Seeing through the complexity creep

Had an interesting chat with the Application Performance Management (APM) team at CA about performance with respect to Service Oriented Architecture (SOA). We were comparing notes on the way in which it is easy for SOA initiatives to descend into confusion as services proliferate and knowledge of the dependencies between them gradually degrades.

From a performance and troubleshooting perspective, this can be pretty bad news as when an application slows down or breaks, it can be difficult to pinpoint where the bottleneck or fault has occurred. This is one of those nitty gritty practical issues that SOA advocates often neglect, but is a real risk for those who are upping their commitment to this distributed computing model.

While this may not be new for more experienced adopters, it is not so simple to know where to start. In an ideal world, good SOA governance would minimise the degree to which such problems occur and, for example, prevent surprises from a service being hammered into failure because a developer decided to call it from a particularly demanding application without telling anyone.

Unfortunately, however, we don’t live in an ideal world, so generating some kind of visibility of what’s going on at execution time is a requirement. Inspection of raw logs from various components in the system coupled with cleverly placed debug code are some of the more common ways of troubleshooting, but this can be tedious and time consuming.

Against this background, I had read about CA’s extension of the APM capability it acquired with Wily into the SOA domain, but hadn’t had a chance to check it out properly. I got that chance recently at CA’s analyst conference in Ottawa.

For those who don’t know the Wily solution set, it grew out of the need for tools to monitor and troubleshoot complex Java applications in high end application server environments, then evolved into more of an end to end APM system. The basic idea is to drop agents in at key points in your network to monitor transaction calls, e.g. between the web server and the application server, the application server and the database management system, and so on.

Data accumulated in this way can be used for real time monitoring and alerting, and for analysis of history for both troubleshooting and planning purposes. While a picture of end-to-end performance can be derived at an application or individual user level (something that can also be done with solutions that monitor response times ‘at the glass’), the approach adopted for the CA APM solution goes further by providing visibility into the performance of individual transaction steps behind the scenes.

As the solution has become increasingly well proven, CA has enjoyed significant growth in demand for the Wily technology, even though it is still not that widely known in the mainstream. As the solution has evolved, however, it moved on from the concept of ‘see to’ to ‘see through’ monitoring, and this is the key to helping unravel what’s going on in a complex SOA environment.

As an example, if the application being monitored makes a call to a service elsewhere on the network, it has always been possible to capture response times at that step along with diagnostic information when things go wrong. This is the ‘see to’ approach. But what if that service calls another one behind the scenes? This is where ‘see through’ visibility comes in, which can be achieved by distributing coordinated agents to equipment running relevant services, and/or by plugging an agent into the Enterprise Service Bus (ESB).

Of course CA is not the only game in town when it comes to performance management, whether in an SOA or traditional application environment, and anyone investigating this field should check out players like HP, IBM, Quest and Compuware too. I thought the insights I got from the CA guys were worth sharing, however, hopefully to stimulate some thought among IT shops who have taken a more tactical approach to SOA and have had visibility and performance issues sneak up on them over time.

It’s an interesting area that we will continue to investigate, so if you have any experience or insights yourself that you are willing to share, feel free to ping me with your thoughts.

2 comments:

Alois Reitbauer said...

Hello Dale,

we at dynaTrace are targeting the same space. There are a couple of problems coming along with the management of SOA applications.

If you are interested just get in touch with me

Raju said...

Thanks for the information shared here. that was an interesting and informative. I had a good experience by participating in the Cloud Computing and SOA Conference in 2009 which is most influential Business Technology Conference covering latest innovations and trends of Cloud Computing, SOA and its technologies. I learnt lot of new technologies in Cloud Computing. And I am planning to attend 2010 edition as well. I found the information about the conference from http://www.btsummit.com