Graviti seeks to gather unstructured data for AI

(taken taken/Shutterstock)

In many ways, unstructured data is the bane of the modern data collector. Compared to the slender nature of structured data, such as numbers stored securely in a database, unstructured data such as words and images are large, chaotic, and difficult to work with. But one company that sees a way through the chaos of managing unstructured data is a startup called Graviti.

Managing the lifecycle of unstructured data – which in its most basic form is just words and pictures – can be very difficult. The data is big, its value obscure, and it resists the kind of natural categorization that structured data lends itself to. It’s no wonder an executive recently dubbed unstructured data “the white whale of the business world.” This stuff is hard to work with.

Despite the difficulty of unstructured data, Ahabs abounds in the real world, as companies step up their collection of unstructured data. A good reason for this is that unstructured data represents the vast majority of new data generated. According to IDC, 80% of global data generated by 2025 will be unstructured.

Another reason for the interest in unstructured data is AI. Advances in deep learning technology, such as natural language processing (NLP) and computer vision models, specifically target unstructured data types as fuel for their formation. AI adoption is expected to increase significantly in the coming months and years, largely due to the availability of unstructured data for training AI models, as well as the democratization of AI tools. themselves.

A technologist who knows the challenges and benefits of unstructured data is Edward Cui. Before founding Graviti in 2019, Cui was a technical lead and machine learning engineer for Uber, where he worked with the massive store of unstructured data extracted from sensors in self-driving cars.

The sheer volume of unstructured data gathered from Uber’s self-driving car sensors was nearly unfathomable. “We did a statistic that showed that the amount of data we collected from a self-driving car division for a week was equal to the data from all of the restaurant business globally for an entire year,” Cui explains.

Uber is a big company, but even it struggled with the computation needed to manage the data. According to Cui, what was missing from the equation was a platform that automated many mundane tasks involved in unstructured data lifecycle management and downstream AI tasks.

“We tried to develop the infrastructure to handle unstructured data internally, but it’s very expensive and time-consuming,” says Cui. datanami. “As the autonomous driving industry exploded, the problem of redundant unstructured data was more important for AI developers, and it was a major obstacle in the entire AI industry. challenge inspired me to create the Graviti Data Platform, which is a modern data infrastructure designed for unstructured data at scale.”

Graviti, which came out stealth a week ago, aims to address some of the big challenges data scientists and AI engineers face using unstructured data to train machine learning algorithms. The Graviti platform, which is S3-based and runs in the AWS Cloud, automates the processes needed to effectively manage data and derive value from it.

The industry need is there. A Graviti survey found that 25% of AI researchers spend between half and two-thirds of their time curating unstructured data, including data collection, cleaning, selection, and exploration. Almost all of the developers who took part in the survey said that their current method of handling unstructured data was insufficient.

Gravit’s primary goal with the Graviti Data Platform is to reduce the time users spend doing the tedious work of data management, freeing them up to spend more time developing models, which is what developers do. ‘IA ultimately want to do.

The Graviti Data Platform

It all starts with helping identify valuable data. The software also manages metadata associated with source data, annotations (like labels) and predictions in one place. Users have filters to help them find the best data for their needs. As they work with data, a Git-like version control system tracks their usage, allowing teams to work more efficiently, the company says. The platform also brings automation to data pipelines created for model training.

“Data version control, data visualization, and team collaboration are key features of our products that help engineering teams increase their productivity in data management and model training,” says Cui. . “The platform has adopted a Git-like framework for managing data versions and collaborating across teams. Role-based access control and visualization of version differences allow your team to work together securely and flexibly. The end result is that Graviti frees developers from drudgery, and they can now spend more time analyzing unstructured data and training models.

The New York-based company raised $12 million in a pre-Series A round. It counts Motional, Alibaba Cloud and AWS as clients. For more information, visit

Related articles:

Taming the “white whale” of unstructured data

Great growth predicted for Big Data

Unstructured Data Growth Puts Holes in IT Budgets

Comments are closed.