The era of the data lake is over: think hybrid data cloud

This is part of Solutions Review’s Premium Content Series, a collection of reviews written by industry experts in maturing software categories. In this presentation, Cloudera Field CTO and Chief Cybersecurity Officer, Carolyn Duby, explains why the era of the data lake is being replaced by the hybrid data cloud.

According to Mordor Intelligence, the data lake market, valued at $3.74 billion in 2020, is expected to reach $17.60 billion by 2026. However, companies that rely solely on a data lake strategy data will eventually face critical limits in terms of its agility and ability to innovate.

Although data lakes can be a simple and cost-effective way to aggregate data from multiple silos and make it accessible to analysts, issues with this approach include data quality, lineage, governance, and compliance issues. security.

  • Datasets can quickly become outdated and lineage is difficult to track.
  • Data lakes are good for batch processing, not real-time analysis.
  • Managing access to sensitive and personally identifiable information (PII) is very difficult, especially across multiple clouds with different security and governance processes.
  • Data in a data lake cannot be easily moved from one cloud to another to optimize workload capacity or cost.
  • Data cannot be easily moved between public clouds and private clouds or between clouds and on-premises systems to meet compliance requirements.

As a result, data in a data lake essentially becomes a new type of silo, making it extremely difficult to create use cases involving data sets from multiple locations, which limits and slows down innovation.

Any data strategy designed for agility and innovation must be hybrid. According to IDC, hybrid is a key driver of the cloud market: “Hybrid cloud has become central to successful digital transformation efforts by defining an IT architectural approach, IT investment strategy, and staffing model. computing that ensures the business can achieve the optimal balance across dimensions without sacrificing performance, reliability, or control.”

In an ideal data environment, we could simply describe a workload we need to run, and the data platform would automatically determine where to run it to maximize performance and cost while ensuring data security and compliance. . Unfortunately, such a solution does not yet exist, but the basis of such a system is a “hybrid data cloud” that facilitates the movement of datasets and workloads between any location – multiple public clouds. and private and on-premises systems – and centralizes the management of all this data. A hybrid data cloud allows companies to start building the automated data infrastructure of the future today. Here are the key considerations for creating a hybrid data cloud environment.

Ensure security and governance everywhere

A hybrid data cloud enables a write-once/run-anywhere approach to data management. Public cloud security models vary, and data teams shouldn’t have to implement a different security model for each cloud. Instead, a hybrid data cloud centralizes security and governance across all environments and audits and monitors user activity and access to meet compliance requirements.

This holistic approach allows companies to move workloads where they want without compromising security and compliance. A US company expanding into the EU faces daunting and costly data challenges related to the General Data Protection Regulation (GDPR). Some data and workloads need to be moved to the EU. Some must be stored in the cloud. Some have to stay put. A hybrid data cloud provides the flexibility to ensure GDPR compliance by locating data and workloads where they need to be.

Shift workloads to optimize costs

According to a 451 report, 57% of enterprises say hybrid is the organizing principle of their IT environments, and today the cost of running some workloads in the cloud is driving cloud repatriation, c i.e. moving some on-premises workloads. Also, a particular workload may perform better in one cloud than another, and the cost of running a workload may vary from cloud to cloud and region to region. other. A hybrid data cloud makes it easy to move workloads to optimize costs.

Focus on customer experience

Today, experience is king, and performance, cost, security, and compliance must all be factored into the impact on customers, including internal users. Great experiences often depend on real-time data to impact customers “in the moment”. Retailers want to present customers with personalized offers while they’re still in a store, not three days later. This not only requires instant access to data, but also cross-functional analytics across multiple platforms – CRM, inventory, promotions, etc. A hybrid data cloud makes this possible.

Establish a framework

Building a hybrid data cloud starts with a framework of requirements. What are the security and compliance issues? What are the cost and performance issues? What workloads should stay on-premises? Which clouds are best for which types of workloads? A multinational bank, for example, may face very different privacy requirements in different parts of the world. These requirements, along with the cost and performance requirements, must be fully understood and codified, so that a supporting technology platform can begin to help or automate the balancing of all competing demands to ensure the desired customer experience.

A fully automated platform that optimizes where workloads need to run will be great every time it arrives. In the meantime, you can’t continue to rely on old data lakes that limit your ability to innovate and accelerate your business. A hybrid data cloud can enable you to transform your business today by enabling you to move data and workloads where they need to be to optimize performance, cost and customer experience while ensuring security and global compliance.

Caroline Duby
Latest posts from Carolyn Duby (see everything)

Comments are closed.