Microsoft Azure Data Share: how to use this big data tool
Microsoft’s cloud-hosted data sharing tools are for anyone who needs to work with big data.
We live in a world of big data, with multi-terabyte databases and data warehouses with billions of rows of records. It’s a world with a lot of analytical opportunities and, at the same time, a whole new set of problems. Scaling has its definite benefits, but it makes it difficult to move data across our data centers and clouds, especially when we want to share it with other teams in the business.
TO SEE: Electronic Data Disposal Policy (TechRepublic Premium)
Traditionally, we’ve just copied the data, passing it on to developers and business analysts as needed. Instead, what is needed is a way to share the source data quickly and securely, while still allowing users to make changes and have full access to the data.
Why use Azure Data Share?
Azure Data Share is Microsoft’s managed data sharing platform, working with Azure storage to provide data snapshots or use in-place sharing to give you the best of both worlds. In addition to the data management tools, there is a governance layer so you can see who has access and control how and when they receive updates.
Setting up a data sharing environment is difficult; you need to find efficient ways to partition data and provide download capabilities. This means having dedicated infrastructure and bandwidth, especially if you have many partners or if you market the data you have and sell it to your customers.
These requirements are a significant barrier to creating an effective data economy, requiring a significant investment from both sides of a partnership to work with shared data. Working in Azure with Azure Data Share means you have a scalable data environment that expands on demand, while serverless systems hosted in the cloud can handle the process of extracting, compressing, and delivering data. data for you. There is no need to create or manage any software or infrastructure, everything is automatically managed for you.
Azure Data Share offers different sharing models for different types of data storage in Azure. Most require sharing snapshots of your data, updating them as new snapshots are published. This means that anyone consuming your data will need connectivity and storage, although things are considerably simpler if you are both in the same Azure region. Some options, like Azure Data Lake, provide support for incremental snapshots, sending changes rather than entire tables or databases.
How to get started with Azure Data Share
Working with Azure Data Share is pretty straightforward; all you need is storage in Azure and an Azure account with the appropriate permissions for your storage account. There are different ways of working with different sources, so make sure you are familiar with the techniques needed for your sharing. You will need to start by giving Azure access to your data source, using Azure firewall tools.
TO SEE: Snowflake Data Warehouse Platform: A Quick Reference (Free PDF) (TechRepublic)
With the appropriate prerequisites in place, you’re ready to start sharing data. Select the data you want to share and set up a posting schedule. Users receive an email invitation and, once accepted, receive their first snapshot of data in their Azure storage account. It is not necessary to share all your data, you can select a set of records to share, giving access to a storage range.
When the data is updated regularly, you can set a snapshot schedule for new versions or for incremental updates. It can be hourly or daily, and users can subscribe to versions when they need it. An important aspect of the sharing process is that users can choose where the data is delivered, so if you are sharing, say, key values from an Azure Blob, the user can choose to have them delivered directly to an Azure Data. Lake ready for analysis. .
TO SEE: How accurate should your analyzes be? It depends on your use case (TechRepublic)
If you are using Azure Data Explorer, you can configure in-place sharing as an alternative to snapshots. This provides a direct link to your store, so users can directly read and query the data while treating it as if it was their own subscription. Any changes you make will be available instantly. Not everyone will need this level of access, although it is extremely useful for internal development teams who need access to live data to test applications.
While much of the Azure Data Share tools are available through the Azure portal, there are also REST APIs, which allow you to build software around your data shares. APIs allow you to add a data sharing portal to a site or help you build and manage a consortium where data is provided by different organizations and the resulting aggregate shared with all members of the consortium.
How secure is Azure Data Share and how much does it cost?
At the heart of Azure Data Share are Azure security tools, in particular Azure Active Directory support for managed identities. This allows controlled access to stores, without either side of the connection having access to the other’s credentials. There are three types of users, owners, contributors, and readers. Owners and contributors can manage their share directly, while readers can only view shared data. You are still in control of the data you share with tools to manage and monitor drives. It is important to note that data is never kept in the Azure Data Share service, it is purely a way to connect two Azure storage accounts. Some metadata about the offered data is kept, but that’s it.
TO SEE: Why data storytelling in business is more important than ever (TechRepublic)
This level of control is perhaps the most important aspect of the Azure Data Share platform. This means that as a provider you can control who has access and how often they can get updates to shared data. Users have some control, managing invitations to shared data and choosing how they use that data.
The price is reasonable, 5 cents to move a snapshot from source to destination, and 50 cents per vCore-hour to create the snapshots (billed per minute and rounded). This compares well to the costs associated with building and operating your own infrastructure, and it could make hybrid data sharing an option if you have a direct connection or high-speed VPN connection between your center. data and Azure. Data can be transferred between Azure regions: A source in the western US can be used in East Asia, with all transfers occurring within Azure’s own network.
If you’re a data consumer, using Azure Data Share gives you more data to use in your apps. The datasets can be combined with your own data, or used with your own analysis algorithms, or as part of your own machine learning training data. There really is no limit to what you can do with it, whether it’s a snapshot or sharing in place, it’s data.