The benefits of Service-Oriented Architecture (SOA) are extremely compelling. SOA accelerates development cycles and lowers the cost of innovation by dramatically simplifying software reuse. It makes business automation flexible and business processes agile by dramatically simplifying integration of existing applications. Business analysts can redesign, simulate, and optimize processes using software services as if they were Lego pieces, while knowing that implementation will not take years to complete, or cause unintended changes to the rest of the environment.
SOA makes user portals come alive by making enterprise application mashups feasible (e.g., combining information across business units or technology silos, as you might do when comparing product defects with customer call center statistics).
It is also eroding the definition of what an application is. In a SOA world, an application’s features, composition, location, and communications are no longer predetermined or even well-known. Instead, an application becomes a changeable collection of dynamic software services, with transaction paths that are determined on the fly from business rules created by business managers.
To IT operations, the key benefit of SOA — flexibility — sounds like performance management chaos. But the old adage remains true: you cannot manage what you cannot understand. For IT operations to get real control over the performance of composite SOA applications, it is imperative that they understand everything about the application’s real-time behavior.
As Phil Fritz, program manager for SOA management in IBM Software Group, puts it, “You don’t want to be the one organization in the enterprise putting up roadblocks to the deployment of new, innovative, and flexible processes and services. Instead, IT operations should take advantage of those projects to implement solutions that help you achieve your goals — higher service levels, higher reliability, and greater productivity.”
How can this happen?
Performance monitoring tools must become smarter, IT analytics must become more sophisticated, and IT processes must be automated flexibly and intelligently integrated.
Smarter Tools
There are two parts to getting smarter performance monitoring: understanding what services exist, and understanding how transactions are flowing through and using those services. (See Fig. 1.)
Understanding what exists should be easy, according to most of the SOA marketing material — just look in the service directory. But this assumes that enterprises have taken a controlled approach to implementing SOA: i.e., starting with an architecture design phase and investing in service directories that are populated as the services get built with development tools.
However, many enterprises are instead taking an evolutionary approach to implementing SOA. It usually starts with individual business units creating and exposing Web services as part of small departmental projects. Then other developers in different business units discover each other’s services, usually through community wikis and other internal social networking tools, and begin reusing the services in their projects.
Soon the enterprise has hundreds of Web services, which are not consistently populated in any directories, and several critical composite SOA applications. This is when the vice president of IT operations, who has been blithely unaware of these projects, gets the job of keeping these services healthy. Since few of these services are consistently populated in any service directory, direct discovery from the production environment is typically the only way to learn what you have.
Understanding how transactions flow through the infrastructure and leverage the available services is also challenging. Although many of today’s SOA applications have fairly straightforward business logic, these composites will become more sophisticated, and when the business logic starts dynamically determining how specific transactions should proceed, then understanding the transaction path gets more complicated. Simply because two services are communicating doesn’t automatically mean that they are related to the same transaction.
“One of the most frequent requests we get from our customers is to determine how the application is wired together right now,” says Jeff Cobb, senior vice president of product strategy at CA Wily Technology Division. “To them it doesn’t matter how it was designed, or what the developer intended to do with a service, or what the service catalog says. What matters is how are the transactions actually flowing around the network right now, how are the messages moving, who is making the calls, and who is receiving them.”
More Sophisticated IT Analytics
Many enterprises have taken the first step and applied some level of relationship mapping or modeling that shows how infrastructure events are related to the delivery of Web applications. However, composite SOA applications will need more advanced analytics. Rampant service reuse makes determining what to monitor somewhat tricky.
“We’ve found that SOA-centric applications have a lot of ‘dirty’ performance metrics,” says Rob Greer, VP of marketing and product management at ClearApp. “If you only monitor the response time for a shared service, it is polluted because you can’t break out the impact of a single composite application on that shared component. What would be best is if the monitoring tool can obtain accurate performance measurements in the context of the specific composite application.”
Making sense of what those metrics are saying is also a multi-faceted effort. For example, application performance slowdowns may still occur, even when the transaction paths are working as they should. Slowdowns can be related to non-functional aspects of the request (e.g., how big the payload is, how frequently you’re making the calls). For example, the developer who wrote the service was expecting it to be called 100 times a day. However, with SOA-enabled reuse, the service is being called 100 times a minute. To characterize these types of issues requires observing real application traffic and analyzing it from multiple perspectives.
Intelligent Collaboration and Processes
Often the missing pieces of the application performance management puzzle are collaboration and processes. Until recently, many IT organizations have been rigidly siloed and isolated. Isolated groups tend to develop “us against the world” mentalities that do not engender the team effort required to solve end-to-end problems that complex, composite applications will create. This must change.
In fact, this is what efforts around using industry best practices, such as IT Information Library (ITIL) v3 processes, help to drive. The key factor to remember with these efforts is that processes are not supposed to be straightjackets; they are supposed to streamline and improve collaboration, how people work with each other. (See Fig. 2.) For example, it does you no good to design a Service Level Management (SLM) process that generates monthly violation reports when operations needs SLM information in real time to actively manage application performance.
This is why the most successful ITIL implementations are adaptations of best practices rather than direct adoptions. By talking to IT staff, ITIL implementers can get a good sense of how to adapt best practices into processes that people will actually want to follow. This also helps them consider the information their staff must have and manage to get their process tasks done efficiently and then determine the best way to provide that information.
In some cases, process integration and collaboration is as simple as sharing reports across the different IT experts. The caveat here is that the reports need to be understandable. Steve Harriman, vice president of marketing at NetQoS, has noted a dramatic increase in the amount of communication and collaboration between his traditional network operations clients and their application support teams.
“One of our customers has completely blended application and network groups together into a single application delivery organization,” he says. “We like to think that NetQoS was a catalyst because our solutions report network flow, response time, and anomalous performance behavior information in a format simple enough that non-network engineers can readily grasp the implications of the analysis.”
In other cases, management products must integrate more deeply for operations to gain the benefits. For example, both BMC and CA have pre-packaged integration between their respective management products that are involved in specific IT processes, such as connecting end-user monitoring, infrastructure monitoring, and service-level reporting tools into an automated process to identify service issues. The increasing popularity of single-architecture tools (such as those from Integrien and Nimsoft) to achieve similar results is also a telling indicator of IT’s needs.
Yet there is still one bridge to cross — communication between operations and development — and composite applications seem to be shaking that fragile bridge.
Call the Developer
In the “good old days,” operations dealing with a Web application performance problem may not have known exactly where in the infrastructure the problem was occurring, but at least they were certain which business function was being affected. For example, if the ordering application was affected, operations knew they had to contact the group that developed it. With loosely coupled applications, bringing in the developers is not that straightforward, because the ordering application is now a composite of multiple services developed by different groups in different business units.
Today, composite applications typically have only one or two connections that cross these departmental or corporate boundaries, so tracking down the different developer groups may not be too onerous. However, the benefits of SOA are so compelling that over time the number of connections per application will explode.
That single ordering application may become a composite of dozens of connected services, each developed and delivered by different business units and/or companies working completely independently. If you thought the finger-pointing was bad between IT silos, just wait until you schedule a troubleshooting bridge call with a dozen developer groups.
These issues will occur in both evolutionary and controlled approaches to adopting SOA (regardless of what the development lifecycle tool vendors preach). Why? Because there will always be a disconnect between how the system was designed to behave and how it actually behaves.
According to CA Wily’s Cobb, “We see the symptoms of this disconnect every time we pilot our solutions against a real application in production. Often the operations team will look at the discovered application transaction paths and say, ‘We had no idea that’s how the application actually works,’ which kicks off a discussion about managing real transactions.”
In the case of the evolutionary SOA applications, those statements are only based on a gut feel of how their enterprise applications should work — e.g., “Our ordering systems should not be using information from the corporate personnel database.” However, there is no certainty behind those statements, because operations has no evidence that the ordering system developer may have reused a service that interacts with the personnel database.
What happens with the controlled SOA case is that operations has already gotten the enterprise architecture team to whiteboard the general application structure and provide a list from the service directory of all the services used. So the operations team pulls out the diagram and says, “We have a picture of it here, and the ordering system doesn’t use any services that connected to the personnel database.”
In actuality, these unexpected connections are usually rogue dependencies being created by service negotiations at run-time. (See Fig. 3.) It is like water always taking the path of least resistance to the lowest point — and the water doesn’t care that it is cutting through your foundation and creating a lake under your house.
However, trying to communicate how the application is actually behaving back to diverse and loosely coupled groups of developers, each with a detailed understanding of how one or two components of that app were designed to behave, will take some doing.
According to Ran Gishri, director of worldwide marketing of BMC Software’s Identify business unit, “Not only must the next generation of management tools be able to communicate complex operational information to widely dispersed and disconnected development teams, but they must also trace transactions with enough depth to facilitate code analysis.”
So this brings us back to where we started: for IT operations to get real control over the performance of composite SOA applications, they need to understand everything about the application’s real-time behavior.
Solutions that can discover the functional characteristics (what is being called by what) are necessary. Solutions that can analyze how services are negotiating with each other to route transactions in real time, as well as their non-functional characteristics, such as how frequently services are called, are necessary. And solutions and processes that can improve collaboration among multiple operational and development groups are necessary.
With these capabilities, application performance management in a SOA world is less daunting and more of an opportunity. SOA projects can be reused by IT operations as the change event necessary to implement robust processes and solutions to improve its organizational performance. After all, isn’t reuse and agile performance what SOA is all about?
Jasmine Noel, of Ptak, Noel & Associates, focuses on converging IT trends and how to leverage them. The company follows trends in ways that help IT directors translate executive strategies into action blueprints. Visit: www.ptaknoelassociates.com.