Support for AI and ML with Object Storage – GCN
From Terabytes to Exabytes: Support for AI and ML with Object Storage
From agriculture to defense, federal agencies are increasingly using artificial intelligence and machine learning to improve critical capabilities, accelerate research advancements, and free up personnel resources.
The by-product of this adoption is the rapidly growing storage of unstructured data in the form of images and video footage. The amount of unstructured data produced globally is growing by up to 60% per year and is expected to account for 80% of all data on the planet by 2025, according to IDC.
All of this data must be processed, analyzed, moved and stored. Currently, many organizations are doing this work using public cloud services. However, as the federal government continues to implement AI and ML technologies, many IT managers are looking for a solution better suited to their cost, convenience, and security needs.
Object storage – which allows businesses to build their own private cloud storage environment on-premises, as well as unlock cutting-edge computing capabilities – is quickly emerging as a viable alternative.
So how does item storage work? How do different item stores compare to each other and to the public cloud? And more importantly, is it easy to implement and use? Keep reading to find out.
First of all
Storage of objects is a completely different approach to storage, where data is managed and manipulated in individual units called “objects”.
To create an object, the data is combined with the relevant metadata and a custom identifier is associated. Because every object has complete metadata, object storage removes the need for a tiered structure like that used in file storage. It is therefore possible to consolidate large amounts of unstructured data into a single, flat, and manageable “data lake”.
Object storage is a common solution for cold archiving. However, with recent advances in technology, data is now accessible much faster, making it ideal for applications such as AI and ML, which require higher performance storage.
Object storage vs public cloud
The emergence of edge computing goes hand in hand with the rise of AI and ML. Using public cloud services to analyze and store data captured by IoT devices and sensors works wonders in urban centers. However, from agricultural drones to bomb disposal robots, connectivity to a central cloud repository is likely to be considerably slower in areas with less dense network infrastructure.
Object stores solve this problem with low-cost remote storage that allows IT to run at the edge. Processing data at the point of collection is significantly faster than sending everything to the cloud, where it needs to be processed and sent back.
Additionally, much of the data used to train AI algorithms needs to be stored long-term for auditing purposes, another area in which object storage excels. Features like versioning, end-to-end encryption, object locking, and continuous monitoring and repair help preserve data for decades at a cost much less than the public cloud.
Compare different item stores
When weighing the options for storing objects, it is important to carefully consider the technical characteristics of the various products. For example, some object stores make multiple copies of each object to protect against data loss, which can consume storage very quickly.
On the other hand, more advanced object stores take advantage of erasure coding, which breaks down a unit of data and stores the fragments on various physical disks. If the data is erased or corrupted – whether by accident or due to malicious activity – it can be reconstructed from the fragments stored on other drives. This reduces storage costs because it does not require organizations to keep multiple copies of each object.
Additionally, erase code platforms can achieve incredible data durability, reduce disk overhead, and improve overall system performance. Of course, not all vendors implement erasure coding in the same way. Different products will likely have different scalability, as well as varying rebuild and rebalance times.
Another important feature to look at is the data consistency model used by different object stores. “Strong consistency” is best for AI and ML applications. In short, this means that after a successful write, overwrite, or delete, any subsequent read request immediately receives the latest version of the object. Some object stores still use “forward consistency,” where there is a lag until read operations return the updated data. This means that the app will occasionally run from older versions of the objects.
Is it easy to set up and use?
Ease of use is subjective, of course. However, storing objects has several advantages. For example, it requires less daily attention than a traditional storage area network because the resiliency of the system allows multiple drives to fail without incurring data loss. This means that over 200 petabytes can be managed by a single administrator.
There is no doubt that managing the data captured by AI and ML applications will continue to challenge government IT teams. Object storage is not a panacea, but solves the problems of cost, speed and security. Going forward, agencies embracing object storage should focus on implementing modular end-to-end data management solutions. These allow items to be exchanged for more advanced technologies when they become available.
Robert Renzoni is a Federal Presales Engineer at Quantum.