DataOps: Pros and Cons | ITPro Today: IT news, tutorials, trends, case studies, career advice, more
DataOps describes the processes and technologies designed to cost-effectively deliver quality, timely data to the users and applications that need it. It aims to do this by replacing the often rigid and fragile custom links between disparate data sources and data consumers with well-defined, easily usable and automated processes to continuously update, integrate and transform data in the format. required and share them.
DataOps uses the principles of Agile development and DevOps (which combines software development with operations to accelerate software development) to “reinvent data management as an automated, process-based IT service that can empower both professionals data and downstream consumers spanning business intelligence users, data analysts, data scientists and business users, âaccording to a June 2021 report from the Omdia market researcher.
Among the objectives of DataOps, according to the DataOps Manifesto, are “satisfy the customer through the early and continuous delivery of valuable analytical information from minutes to weeks”, “accommodate changing customer needs, and … embrace them to generate competitive advantage” and deliver repeatable results by “versioning” – tracks the states of everything from data to the hardware and software used to deliver each data set or analytical result.
DataOps are driven by the growing need for predictive and real-time information, as well as the growing use of artificial intelligence which requires large amounts of data to identify trends and make predictions.
Some of the underlying functions of DataOps may be provided by legacy tools such as Master Data Management (MDM) platforms (which ensure data accuracy and integrity) for hybrid and multi-cloud deployments, by newer tools from cloud providers and by emerging disciplines such as AIOps that use artificial intelligence to automate the deployment, monitoring and optimization of IT resources.
How does DataOps work?
DataOps first requires visibility into the state of a company’s data assets and those that are being used, or could be used, to generate business information. This visibility is often provided by the use of data hubs or data catalogs that leverage metadata (data about data) to help data stewards and users understand and easily access large amounts of data. ‘business.
Armed with this knowledge, data stewards can create scripts and application programming interfaces (APIs) that automate the collection, validation, integration, and analysis of data from multiple sources. Ideally, these automated processes allow users to access data analysis âas a serviceâ much like using other applications in the cloud. They can also automate the detection and response to unexpected usage requests, configuration changes and errors, as well as accommodate new data sources such as sensors on the Internet of Things.
“When a consumer such as a business analyst says they need a new query in SQL or Tableau, it is not necessary for a data engineer to create a query in the system to, for example, join two tables together, âsaid Bradley Shimmin, chief analyst, AI platforms, data and analytics at Omdia.
DataOps also requires that data scientists work more closely with business users to understand their needs, and that business users describe those needs so that data scientists can create the right âdata servicesâ for them. Creating DataOps Centers of Excellence in business units helps non-technical data owners “understand the importance and value of the nuances of working with data so that they can eventually take ownership and manage the data with which they work closely together, âexplains Shimmin. This allows them to participate in important DataOps processes such as the design of data catalogs, he says.
What are the advantages of DataOps?
The biggest benefit of DataOps is faster access to the ever-changing variety of data and types of analysis needed to meet dynamic business challenges. âUsing metadata to describe data assets and delivering data through application programming interfaces (APIs) can help enterprise practitioners unify disparate data stores across multiple cloud platforms without having to to physically move, replicate or virtualize the data, âaccording to the Omdia report. âThis approach will allow companies to get a more central and comprehensive picture of their business without having to disrupt existing infrastructure investments. “
DataOps can âdemocratizeâ data access to a wider range of business users and reduce the need for hard-to-find data scientists to develop custom queries or integration to meet each new data requirement. It can also prevent seemingly minor changes to data sources, such as how a semantic model defines a date string, from compromising the results of a downstream system such as an AI application by automating testing and solving potential problems, explains Shimmin.
What are the disadvantages of DataOps?
Done poorly, DataOps can create hard-to-share data silos as business units or departments build their own data hubs in relatively inexpensive public cloud platforms without following company standards in areas such as as security, compliance or data definitions.
It also requires purchasing, implementing, and supporting multiple tools to deliver everything from code and data version control to data integration, metadata management, data governance, security and compliance, among other needs. Tools that support the operationalization of AI analytics and pipelines for purposes like DataOps typically have overlapping capabilities, making it even more difficult to identify the right product and the right framework for implementation. implemented, says Gartner analyst Soyeb Barot in a January 2021 report.
DataOps implementations can also be hampered by, among other things, over-reliance on fragile Extract, Transform, and Load (ETL) pipelines; reluctance or inability to invest in governance and data management; the continuous explosion of data to be managed; and the complexities of integration, according to the Omdia report. For these reasons, Shimmin estimates that less than one in five companies have successfully implemented DataOps.
Examples of DataOps
IBM Cloud Pak for Data provides tools for data integration, data preparation, replication, governance, cataloging and quality, as well as MDM. IBM says that supporting and integrating Cloud Pak for Data with its IBM Watson Knowledge Catalog helps customers “enable business-ready data for AI and analytics with intelligent, supported cataloging. through active management of metadata and policies â.
Informatica provides tools for data privacy management, data preparation, data cataloging, MDM, cloud native data delivery services, and data governance, which leverage its platform for increase and automation of AI, CLAIRE. The company uses CLAIRE “very effectively to solve DataOps problems such as continuous operations through self-healing routines, auto-tuning, auto-scaling, and smart shutdown of services.” , according to Omdia.
DataKitchen’s DataOps platform supports cloud and hybrid cloud deployments in areas such as data observability, automated testing, continuous deployment through orchestration and automation, a platform for single management for multiple analytical pipelines and self-service access to analytical data and information. DataKitchen, one of the main backers of the DataOps dynamic, is positioning itself as an orchestrator of the tools necessary for the DataOps that a company already has, specifies the Omdia report.
GoodData’s data-as-a-service offering delivers metrics, analytics, and other assets to business customers through a rich API. This provides a single data service layer, explains Omdia, âfrom which companies can build and deploy their own analytics applications flexibly using a single source of truth that is secure and compliant with security and accountability mandates. confidentiality of company data â.
Delphix’s Data Platform provides a programmable data infrastructure designed to automate data operations, including continuous integration and cloud delivery migrations, and compliance. The platform integrates with platforms ranging from mainframes to cloud-native applications, automating “delivery and access to data, whether on-premises or in a hybrid or multi-cloud environment,” says Delphix .
DataOps users will continue to look for centralized data governance tools and processes that help ensure security and compliance, but are easier to use than those they’ve adopted in the past, according to Shimmin. While some DataOps vendors will focus on point solutions and others on global platforms, all have a strong incentive to ensure that their offerings can easily integrate and share information with each other, explains Shimmin. Such capabilities will, he predicts, lead to CloudOps giving way to âdata fabricsâ or âdata meshesâ that allow local business units to control, analyze and share their own data to meet needs. urgent business while meeting corporate security and compliance requirements.
To be successful in CloudOps, companies cannot “view data as a discrete, petroleum-like resource that needs to be centralized and processed once to be incorporated into a refined, exhaustible fuel like gasoline,” the Omdia report states. âOn the contrary, companies need to see data as an ever-changing and highly malleable form of energy, something that can move freely, combine and recombine again to power a myriad of information across the enterprise. . “