By Greg Law
Evolution is not an evenly paced process. Rather than happening continuously at the same speed, the earth’s ecosystem tends to be relatively stable over the eons, periodically disrupted by rapid change over a few brief millions of years. Sometimes this is triggered by outside events, such as a meteorite, while other shifts are less well understood.
It seems to me that cultural and technological changes tend to happen in brief bursts on human timescales too. Mainframes and minicomputers in the 1960’s and 70’s suddenly gave way to desktop computers in the 1980’s, which were suddenly displaced by mobile platforms. The culture of software development exhibits the same pattern. For example, during the 1970’s there was a relatively sudden shift to adopt higher level programming languages, in the late 1980’s IDE’s and interactive debuggers became widely used.
Between, say, 1990 and 2010, the way we created and deployed software didn’t evolve radically, despite the external transformation to IT that the growth of the internet brought. Of course languages and styles came into, and went out of, fashion, but a developer from 1990 could easily recognize the landscape 20 years later—the tools and processes he or she had been using would still be there, pretty much the same, just newer and shinier. For example, Visual C++ 1.0 was released in 1993, 17 years later, the now-rebranded Visual Studio 2010 was a lot slicker and richer, with a much more beautiful user interface, but it was basically the same thing.
Development and Deployment Today
Move forward into the current decade and the world of how we develop and deploy software is now changing incredibly fast. In 2010 lots of teams had adopted practices such as Agile, Test Driven Development (TDD), and even Continuous Integration (CI), but the vast majority of development teams were still doing things the old fashioned way, with big product requirement documents, year or multi-year-long development cycles with a multi-month code-freeze and quality assurance period at the end.
A large number of software projects didn’t even have any kind of comprehensive regular regression testing. Five short years later, almost everyone uses some kind of agile with a decent test-suite, and those releasing on a yearly or quarterly cadence are the laggards, firmly in the minority; most software projects release or deploy new versions multiple times per month and some even several times per day. By moving from the old, waterfall style to CI, Continuous Deployment (CD), and embracing DevOps, new features are introduced faster, delivering competitive advantage and responding more quickly to user needs. In its 2015 report, Market Trends: DevOps—Not a Market, but a Tool-Centric Philosophy That Supports a Continuous Delivery Value Chain, Gartner predicts that 25 percent of Global 2000 organizations will use DevOps by 2016. A software engineer from 1990 would feel right at home if transported to a typical development team in 2005. An engineer from 2005 transported to 2015 would recognize very little.
The wider digital economy has of been undergoing huge changes at the same time. Software is ever-more vital to the world we live in. Whether in cars, planes, the Internet of Things, wearables, or our smartphones, the shift to cloud and mobile as the dominant computing platform is at least as big a transformation as mainframes to desktops. And all the time, software, as they say, is eating the world.
These modern practices—like agile, TDD, CI, CD, and DevOps—have combined with new languages and platforms to allow software to be produced at an incredible rate. But more software means more complexity and the result is daunting.
More Control, Added Complexity
As software controls more of the world around us, programs no longer operate in isolation, but interact with other software and run on multiple devices in an increasingly intricate ecosystem. Modern practices didn’t make life simpler for long; rather they allow us to do more and so complexity increased. Research from the Judge Business School at the University of Cambridge found that the annual cost of debugging software had risen to $312 billion. The study found that, on average, software developers spend 50 percent of their programming time finding and fixing bugs. It seems that solutions to problems lead to new problems. It’s like hanging wallpaper—you remove a trapped bubble of air from one area, only to see a bubble pop up elsewhere.
TDD and CI development tends to mean more testing, and that means more test failures.
TDD means—or at least should mean—more testcases. CI means these richer, fuller test-suites are run more often. The cloud and elastic compute resource means that the number of tests that can be run is limitless. A software project of a given size can easily run two or three orders of magnitude more tests every day than the equivalent project would have run ten years ago. Which can only be a good thing, right? Except, of course, for all those test failures. It’s no good arguing that all tests should work—after all, if tests never fail, what’s the point in having the tests in the first place? If many thousands of tests run every hour, and 0.1 percent of them fail, triaging these failures can quickly become a nightmare.
In my experience, most teams have so many test failures that they don’t have time to investigate them all. Typically, the majority of the failures, let’s say nine out of ten, turn out to be benign—it’s a bug in the test itself as opposed to the code it’s testing, a problem with the test infrastructure or some esoteric combination of circumstances that you know will never happen in practice. But lurking somewhere in those ten failures is the one that does matter: the kind of bug that will eventually cause a serious production outage or a security breach. And the only way to know which is the one in ten that matters? Investigate until you properly understand all ten failures. In practice, very few of us have the time to investigate all of the failures, and so we essentially play Russian roulette with our code.
In the context of the hundreds or thousands of test failures that a mid-size software company might experience every day, you can see the scale of the problem. All these test failures are in fact a sub-problem of the wider issue that all this software we’re creating is so fantastically complicated that no-one really understands what it’s actually doing. Now to be clear, I think it’s beyond question that Agile, TDD, CI, CD, and DevOps are all good things. Software engineering as a profession has made great advances over the past decade, of which we programmers should collectively be proud. But these advances have ushered in new problems—or at least exacerbated old ones—to which we must now turn our collective attention.
Software development teams need to take a step back and find ways to understand what their software actually did, rather than what they thought it would do, and then seamlessly to feed that information across the team so that it can be evaluated and the failures fixed. The good news is that technology can help: a number of new technologies and tools are becoming available to help developers to understand what their code is really doing, both under testing and in production. I believe that bridging this understanding gap is one of the major challenges that the industry needs to solve during the remainder of this decade. If the rising tide of test failures and general complexity of software is not to slow or even halt the increase in the pace of development we have witnessed in recent years. No single technology or technique is going to make the problem go away, but I believe the next generation of tools can and will help.SW
Greg Law is the co-founder and CEO of Undo Software. He is a coder at heart, but likes to bridge the gap between the business and software worlds. Law has over 15 years of experience in the software industry and has coded for companies including the pioneering British computer firm Acorn, as well as fast-growing start-ups, NexWave and Solarflare.