Digging Through Data
Within infrastructure, the data integration and data quality segment is slated to grow.
By Amber E. Watson
Enterprise infrastructure software is essential to every organization’s processes. The industry comprises many sectors, depending on the level of data access needed. In its Enterprise Infrastructure Forecast by Segment (3Q12 Update), Total Software Revenue Worldwide 2011 and 2016, Gartner identifies and outlines infrastructure growth rates, indicating that data integration (DI) and data quality tools have the highest projected growth.
The firm expects an overall five-year compound annual growth rate (CAGR) of 9.3 percent, increasing from a reported $3.5 billion in 2011 to a forecasted $5.4 billion in 2016.
Companies more adept at the process of converting data into intelligent information must put trusted infrastructure software tools in place to extract, understand, cleanse, integrate, and transform data into information that may be used for important decisions. These steps typically require more than basic infrastructure and involve an investment in the right DI and data quality tools.
The role of DI and data quality tools is increasingly important as the volume and sources of data continue to grow. Here we discuss the core functions of such tools, identify reasons for the segment’s growth, and highlight the capabilities of leading DI and data quality tools.
DI tools are available to free up developer resources by speeding software development, testing, and deployment processes.
According to Carl Olofson, database management and data integration analyst, IDC, top vendors in the data integration and access software market, based on 2011 software revenue, include IBM, Informatica, SAS, SAP, and Oracle. Each offers bulk data movement—extract/transform/load (ETL) or extract/load/transform (ELT)—tools, composite data frameworks, real-time data collection and delivery software, dynamic data movement, and master data definition and control software—core software for implementing master data management solutions.
In addition, all but IBM offer data access infrastructure software—mainly data adapters and connectors—including Open Database Connectivity (ODBC)/Java Database Connectivity (JDBC) drivers. All but Oracle offer general data quality tools. These tools perform data quality functions on any data type, as opposed to domain-based matching and cleansing, which support specific domains such as customer or product, or formats, such as mailing addresses. IBM and Oracle also offer domain-based matching and cleansing software.
According to IDC’s 2012 Data Integration and Access Software Forecast, the market shows a five-year CAGR of 10 percent. In this forecast, the strongest elements are composite data frameworks, dynamic data movement, and master data definition and control.
"These are all critical elements to achieving a data environment that is fully integrated and consistent at all times, which serves real-time integration—delivering all necessary and relevant data at just the right time to applications that are process-integrated to enable dynamically manageable straight-through processing," explains Olofson.
This level of integration is essential for any enterprise that wants to leverage all of its IT resources in order to execute as efficiently as possible, and to maximize the business opportunities.
High Growth Segment
Todd Goldman, VP/GM, enterprise data integration, Informatica, believes that DI and data quality tools are pegged with a high growth rate compared to other segments for two main reasons. The first is due to the amount of data, which is growing rapidly thanks to the Internet and social media. Second is the realization by companies that data is a strategic asset that can be leveraged for competitive advantage. "They want to become more sophisticated in their use of it," he says.
As data continues to grow, organizations realize that establishing a program to address data issues is paramount to success. "Simply put, data and the proper management of that data is a driving force that determines which companies will thrive and which will struggle in the years to come," predicts Todd Wright, global product marketing manager, SAS DataFlux Data Quality.
"Data management that encompasses data quality, data integration, and master data management is no longer seen as an elective for organizations," he says. "The data that drives a business is critical for both day-to-day decisions, such as offering a customer an additional service, or strategic initiatives like expanding the customer base into Asia. This information must be managed as a corporate asset."
"With pressures to innovate, differentiate, and streamline operations, companies must leverage data to improve how business is run, and the ability to remove data silos across all systems and computing has become a ‘must have’ rather than a ‘nice to have,’" says Irem Radzik, director of product marketing, Oracle Fusion Middleware. "Data integration and data quality tools run in the center of such initiatives."
Data quality is also a requirement for businesses running online applications where data errors/duplicates have tangible negative effects on business. Trusted data helps achieve the best results in operational efficiencies as well as customer experience.
The advent of big data plays a large role in the growth of this particular segment. "Information integration capabilities, including data integration and data quality tools, help businesses filter through, add value to, and establish lineage and governance for the new data that is admitted into their business and used in their systems," explains Paula Wiles Sigmon, program director, IBM InfoSphere information integration product marketing.
"People understand that big data alone does not solve business problems. Thus, they look for the right information integration capabilities to help triumph over the proliferation of big data. With big data integration, people are able to make rapid and governed decisions, leveraging their expanded information assets while minimizing risks," she adds.
According to Info-Tech Research Group Inc’s Vendor Landscape Plus: Data Integration Tools, the integration space consists of data and process integration as well as middleware, but Info-Tech predicts that the capabilities of integration tools will merge across all types into single-product family offerings within the next three to five years.
Several vendors offer DI platforms in order to addresses multiple enterprise data integration needs.
IBM’s comprehensive information integration platform, InfoSphere Information Server, helps clients understand, cleanse, monitor, transform, and deliver data, as well as collaborate to bridge the gap between business and IT. "As a new era of computing unfolds with an explosion in the volume, variety, and velocity of data, integrating trusted information is more important than ever to enable critical projects and key analytics initiatives," asserts Wiles Sigmon.
InfoSphere Information Server provides the capabilities to help clients understand and discover data—often data they didn’t know they needed to consider—as well as to help IT and business teams collaborate and strengthen their governance over the data they use for strategic business decisions.
Informatica’s PowerCenter provides core functions such as data movement and transformation, which offers the ability to take and transform data from one or more systems, then load that data into a target system.
Connectivity and data profiling are also important functions. "The ability to connect to different systems that have data might seem easy, but in today’s world, companies expect ‘connectors’ to have some knowledge about the structure of the data to which they connect so that the pulled data makes sense," shares Informatica’s Goldman.
In addition, the ability to analyze a data source and perform statistical analysis provides a greater understanding about the kinds of data that are in that source. Profiling data helps data analysts who later extract that information and combine it with other data.
Lastly, DI quality assurance (QA) software presents a relatively new area. Developing a data integration process that takes raw data and combines it with other data, then transforms, cleans, and loads it into another application or data warehouse is a complicated software development task. A QA process helps ensure that the ETL code that is developed to move the data actually works as expected.
The Oracle Data Integrator platform covers DI requirements from high-volume batch loads to event-driven, trickle-feed integration processes to service-oriented architecture (SOA)-enabled data services. Additionally, it uses an ELT rather than an ETL architecture to reduce total cost of ownership by exploiting the database engine’s power for transformations.
Oracle Data Integrator Enterprise Edition supports heterogeneous sources and targets out of the box, and the knowledge module framework allows tailoring the solution to a broader set of third-party technologies, applications, and best practices. It integrates with Oracle GoldenGate to enable real-time data integration, and with Oracle Enterprise Data Quality to ensure trusted data is used as part of the DI process.
"The most common data domains in data quality are customer—generally, party data including suppliers and employees—and product data," explains Oracle’s Radzik. "Data quality problems and required solutions are different for different data domains. Oracle Enterprise Data Quality products recognize these differences and provide purpose-built capabilities to address each."
SAS Data Management Advanced enables users to manage virtually all data sources, including big data; extract, cleanse, transform, aggregate, load, and govern data; support data warehousing, migration, synchronization, and federation initiatives; support both batch-oriented and real-time master data management solutions; and create real-time data integration services in support of SOAs.
"Data management is not a product to install and then wait for results," says SAS’s Wright. With this in mind, SAS begins by learning about the data management team of customers, their objectives, and the business reasons for embarking on a data quality, DI, and master data management program.
By establishing a data governance framework, organizations are able to implement data quality processes within operational and analytic applications, helping transform disparate data, remove inaccuracies, standardize common values, and create a strategic, trustworthy, and valuable data asset that enhances decision-making power.
Perhaps the biggest challenge is learning how to manage the variety of data technology available.
This is one of the reasons companies like Informatica provide one development environment that can be used to develop data integration using different transportation mechanisms—ETL, ELT, Hadoop, and virtualization—without the need for learning a brand new technology.
Through platforms like Informatica PowerCenter, a developer is able to learn one environment but also gain the ability to deploy to any kind of DI environment. "We separate the development environment from the underlying transportation technology, which allows customers to learn one platform, and gives them the skills to use a variety of new technologies without having to become an expert in those technologies," says Informatica’s Goldman.
The Cloud Evolution
With goals to improve IT efficiency through optimized resource usage and to simplify IT through automated software lifecycle management, data management in the cloud is seen as a major growth area in the data management space.
With more applications running in the cloud, the simple issue of connectivity is a must. DI must be able to connect and integrate to cloud applications such as those from Salesforce.com or Eloqua, recently acquired by Oracle.
"A cloud strategy, however, should not be seen as a replacement for the necessary involvement of business departments and a formal data governance strategy," cautions SAS’s Wright. "Nor should a cloud strategy be seen as migrating all data management to the cloud."
Knowing the essential stakeholders and understanding their data management expectations—as well as the entire data management landscape—helps provide the basis for a well-planned approach of adopting data management cloud services for individual workloads.
Public and private clouds provide organizations with another possible deployment model for development, testing, and/or production environments. "While many organizations are still concerned with persisting customer information in public cloud environments, we see a general willingness to rely on the cloud for greater flexibility and the opportunity to reduce total cost of ownership," notes IBM’s Wiles Sigmon.
Organizations must think through what type of data is placed in the cloud, and make sure the cloud option is an appropriate deployment model for their needs.
The vendors mentioned above and others have cloud strategies that take two forms. "One is offering data integration services in the cloud, and also offering the integration of cloud-based software-as-a-service (SaaS) data residing in different services in separate clouds with each other and with on-premise data," notes IDC’s Olofson.
A company’s ability to run DI software in the cloud itself offers greater flexibility. Informatica, for instance, has a cloud-based offering in which customers do not have to install a full-blown development environment on their own premises. Users may log in to the cloud offering and program DI tasks from Informatica’s own cloud-based DI service.
Informatica has both an on-premise and cloud-based offerings. "The major difference is that the cloud-based offering targets a more business-oriented user and is easier to use for relatively simple integration tasks. As time goes on, however, Informatica plans to add functionality to its cloud offerings to service more technical IT users as well," explains Goldman.
In addition to migrating data to the cloud, Oracle Data Integration (ODI) supports any platform-as-a-service offering because ODI supports non-Oracle systems and is highly extensible for different sources and applications. The solution supports cloud deployments with data-layer application integration between on-premise and cloud environments of all kinds.
For private cloud architectures, consolidation of one’s databases and data stores is an important step to realizing the full benefits of cloud computing.
Both Oracle GoldenGate and Oracle Data Integrator provide the ability to consolidate data. Once data moves to the cloud, the products are able to connect the on-premise enterprise systems and the cloud environment by moving data in bulk or as real-time transactions across geographies.
For public cloud architectures, Oracle Cloud enables users to move data through Oracle SOA Suite using representational state transfer (REST) APIs to Oracle Messaging Cloud Service—a new service that lets applications deployed in Oracle Cloud communicate securely over Java Message Service. Oracle Data Integrator now supports a knowledge module for Salesforce.com available on AppExchange. Customers and partners are developing other third-party knowledge modules every day.
IBM InfoSphere Information Server provides on-premise installations in symmetric multiprocessor (SMP), massively parallel processing (MPP), grid, and private cloud configurations. InfoSphere Information Server supports traditional ETL deployment models and may also leverage the MPP capabilities of the data warehouse appliances in ELT operations as well.
In addition, InfoSphere Information Server provides off-premise, cloud-hosted deployments on leased infrastructure supplied by Amazon Elastic Compute Cloud (EC2) and IBM SmartCloud Enterprise. IBM also provides WebSphere Cast Iron Cloud as a cloud-hosted SaaS offering, which can integrate with hundreds of SaaS applications and brings data on premises.
IBM recently announced IBM PureApplication System, as well as an accompanying InfoSphere Information Server pattern for private cloud operations.
Software that addresses key areas in security and system management is expected to benefit from growth in complexity of cloud environments.
Linking Departments with Data
DI and data quality were once viewed as departmental and standalone initiatives run by IT and seen as simply cleaning data for mass communications projects or data warehouse loading. Data quality is now a permanent enterprise program that links all business and IT departments and forms the framework for master data management.
Software vendors are working to simplify the complexity of technology so that customers are able to focus less on the process of data integration and more on making business decisions that lead to breakthrough results. SW
Apr2013, Software Magazine