The evolution of smart data pipelines

[ad_1]

The potential of artificial intelligence (AI) and machine learning (ML) seems almost limitless in its ability to derive and drive new sources of customer, product, service, operational, environmental and societal value. If your organization is to compete in the economy of the future, AI must be at the center of your business operations.

by KearneyAnalytics Impact in 2020” highlights untapped profitability and business impact for organizations seeking justification to accelerate their data science (AI/ML) and data management investments:

  • Explorers could increase profitability by 20% if they were as effective as the Leaders
  • Followers could increase profitability by 55% if they were as effective as the Leaders
  • The Laggards could increase profitability 81% if they were as effective as the Leaders

The business, operational and societal impacts can be staggering except for data, which is a major organizational challenge. No less than Andrew Ng, the godfather of AI, pointed out the hurdle of data and data management in empowering organizations and society to realize the potential of AI and ML:

“For many applications, the model and code is basically a solved problem. Now that the models have reached a certain point, we need to get the data working.” — Andrew Ng

Data is at the heart of training AI and machine learning models. And high-quality, reliable data orchestrated through a highly efficient and scalable pipeline means AI can enable these compelling business and operational outcomes. Just as a healthy heart needs oxygen and reliable blood flow, regularly cleaned, accurate, enriched and reliable data flow is essential for AI/ML engines.

For example, a CIO has a team of 500 data engineers who manage more than 15,000 extract, transform, and load (ETL) jobs responsible for acquiring, moving, aggregating, standardizing, and aligning 100s of special-purpose data repositories (data). markets, data warehouses, data lakes and data lakes). They perform these tasks under ridiculously stringent service level agreements (SLAs) across the organization’s operational and customer-facing systems to support a growing variety of data consumers. It seems that Rube Goldberg must have definitely been a data architect (Figure 1).

Figure 1: Rube Goldberg data architecture

Reducing the debilitating spaghetti architecture structures of one-time, special-purpose, static ETL programs to move, clean, align, and transform data greatly hampers the “time to insights” required for organizations to take full advantage of data’s unique economic properties, NS”world’s most valuable resource” according to this Economist.

Emergence of smart data pipelines

The purpose of a data pipeline is to automate and scale common and repetitive data collection, transformation, migration, and integration tasks. A well-structured data pipeline strategy can accelerate and automate the processes involved in collecting, cleaning, transforming, enriching, and moving data to subsystems and applications. As the volume, variety, and velocity of data continues to increase, the need for data pipelines that can scale linearly in cloud and hybrid cloud environments is increasingly critical to a business’s operations.

A data pipeline refers to a set of data processing activities that integrate both operational and business logic to perform advanced sourcing, transformation, and data loading. A data pipeline can run on a scheduled basis, in real time (stream), or can be triggered by a predetermined set of rules or conditions.

Additionally, logic and algorithms can be embedded in a data pipeline to create a “smart” data pipeline. Smart pipelines are reusable and extensible economic assets that can be customized for source systems and perform the data transformations necessary to support unique data and analytics requirements for the target system or application.

As machine learning and AutoML become more pervasive, data pipelines will become increasingly intelligent. Data pipelines can move data between advanced data enrichment and transformation modules, where neural network and machine learning algorithms can create more advanced data transformations and enrichments. This includes segmentation, regression analysis, clustering, and the creation of advanced indices and trend scores.

Finally, AI can be integrated into source systems, necessary data transformations and enrichments, and data pipelines so that target systems and applications can continually learn and adapt based on evolving business and operational requirements.

For example: in healthcare, an intelligent data pipeline can analyze the grouping of healthcare diagnostic-related groups (DRG) codes to ensure consistency and completeness of DRG submissions and to detect fraud as DRG data is transported by the data pipeline from the source. system to analytical systems.

Realizing business value

Chief data controllers and chief data analytics officers are challenged to unlock the business value of their data, that is, apply it to the business for measurable financial impact.

The ability to deliver high-quality, reliable data to the right data consumer at the right time to facilitate more timely and accurate decisions will be a key differentiator for today’s data-rich companies. A Rube Goldberg system of ELT scripts and different, dedicated analytics-centric repositories prevent an organization from achieving this goal.

Learn more about smart data pipelines Modern Enterprise Data Lines (eBook) Dell Technologies Here.

This content is produced by Dell Technologies. It was not written by the editorial staff of MIT Technology Review.

[ad_2]

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *