By David Gingell
Aragon Research recently released a report that identifies an emerging Intelligent Content Analytics (ICA) market. The report is among the first, and perhaps the most comprehensive of evaluations of how the current class of Intelligent Content Analytics Platforms (ICAPs) will enable enterprises to leverage their critical content.
Aragon Research analysts Jim Lundy and Adrian Bowles identified ICA as a new business category that goes beyond the traditional store and secure approach to content management, enabling actionable insights through advances in artificial intelligence (AI), most notably machine learning and natural language processing algorithms.
The emerging category grows out of providers currently focused on the extraction of intelligence out of unstructured content using AI, and Aragon predicts that the insights gained from this shift will allow enterprises to operate more efficiently and grow revenue faster with less risk. The firm forecasts that this new market will grow to more than $10 billion by 2025, with the combined adjacent markets—business intelligence (BI), enterprise content management (ECM), sales engagement, and cloud office suites—accounting for $67 billion by 2025.
Moreover, the report identifies Contract Analytics as a seminal use case at the vanguard of forming the new ICA category, where AI technologies are already being used to extract key clauses, information, and normalizations to deliver detailed insight over tens and thousands of contracts. It concludes that ECM systems are not sophisticated enough to leverage and exploit unstructured content in the digital enterprise, and the coming ICA era will usher in a new set of providers that offer platforms that integrate with existing ECM systems such as those offered by Alfresco, Hyland, IBM, Microsoft, Nuxeo, and OpenText.
My colleagues and I have been providing discovery and analytics technology to many businesses across the world for more than seven years, helping them understand and leverage the information contained within their contracts. Many look to us to help them understand, codify, and drive insight of other types of unstructured content, and I believe ICA platforms will become the norm for enterprises and will provide the insight which ECM platforms have failed to deliver.
So, what exactly is ICA? According to Aragon’s Lundy, it refers to the use of analytics to derive insights from content, “where the text or a higher-level abstraction of meaning—called a concept—has been organized in a model that can be mechanically processed.”
Aragon sees ICA as the ‘third era’ for unstructured content. The focus in this era has shifted from management to analysis, and is firmly the extraction of actionable insights versus the storing and tracking of content.
ECM systems have been around since the late 1990s, when web content management, document management, and digital asset management morphed into singular enterprise-wide platforms to handle nearly all types of unstructured content. Mostly they had a bias, with some focused on the regulatory nature of critical content with capabilities, like auditing, workflow, and security as their strengths, while others focused on managing website content and the creation, workflow, and approval processes around that.
What all these systems had in common was the use of metadata to categorize the pieces of content. Metadata allowed the system to understand the relationships between documents, and to describe the main features of the content objects. However, none were really cognizant of the actual contents of the content object itself. That is, what the words in the documents said, what the sentiment might be, and what those words actually meant for the owner of those objects. They could be described to a certain level within the metadata. For example, it might identify the document type, authorship, and review processes associated with it. It might even have some economic data associated with it, but that would have had to be entered manually, either at the time of creation or during interaction with the asset.
Lundy points out in the research note that there is limited intelligence about what content objects actually contain in these ECM systems, and virtually no level of analysis is being done on them. Until now that is, and that’s why Aragon has identified the rise of ICA.
With the development of AI and machine learning models, and the massive recent improvements in computer processing power, it is now feasible to do deep analytics on hundreds of thousands of pieces of content in parallel, and to extract key information contained within them. Then—and this is key—it is now possible to derive insight from them which leads to better decision making, risk mitigation, or opportunity taking. Lundy essentially says this is something ECM systems were never designed to do.
Legal documents, that is to say contracts, are the obvious place to start for this level of deep analytics. Contracts are the formal instantiation of a business relationship between two entities, and among other things, define the offer, the obligations and the requirements placed on both parties. So, there is a lot at stake with these types of documents. These documents contain legally enforceable clauses that have been negotiated and agreed between the parties. These could feature both risk and opportunity for one or both contracting parties. They could contain revenue-generating opportunities such as negotiated pricing agreements or perhaps risk in terms of obligations associated with a data breach. So, unsurprisingly, contract analytics has been the first major use case for content analytics.
I believe that expertise is the perfect foundation for the even bigger market of content analytics, where other key unstructured content objects, such as insurance claim forms, medical records, marketing content, and financial documents, can be given the same attention with deep analysis that contracts have been enjoying over the last five or six years. It is an emerging market and the idea of a platform for analyzing unstructured content in real time is at its infancy, but the foundations are here.
In the way, the fact that BI tools have evolved to manage structured data at massive scale, especially with technologies like SAP HANA and Hadoop, ICAPs will emerge to do the same for unstructured data or content. This third era is an exciting development in the evolution of content, moving it firmly from the ‘management era’ to the ‘analysis era.’
David Gingell is Seal Software’s CMO. Over the past 25 years, he has worked with well-known technology companies including Adobe, EMC, and Oracle, serving market sectors spanning finance and banking to pharmaceuticals. He holds an MBA from Henley Business School and a BSc in Psychology from Swansea University.