Spanish English French German Italian Portuguese
Social Marketing
HomeTechnologyBig DataGoogle Cloud Launches BigLake: New Cross-Platform Data Storage Engine

Google Cloud Launches BigLake: New Cross-Platform Data Storage Engine

At his Cloud Data Summit, Google introduced BigLake, a new data lake storage engine that makes it easier for companies to analyze the data in their data warehouses and data lakes.

The idea, in essence, is to take the experience of Google with running and managing your BigQuery data warehouse and extending it to data lakes in Google Cloud Storage, combining the best of data lakes and warehouses into a single service that abstracts the underlying storage, formats, and systems.

It's worth noting that this data could be in BigQuery or AWS S3, Azure or Gen2. Through BigLake, developers will gain access to a consistent storage engine and the ability to query underlying data stores through a single system without moving or duplicating data.

“Managing data across disparate lakes and warehouses creates silos and increases risk and cost, especially when data needs to be moved,” said Gerrit Kazmaier, vice president and general manager of databases, data analytics and business intelligence at Google. Cloud. "BigLake enables enterprises to unify their data lakes and warehouses to analyze data without worrying about the underlying storage system or format, eliminating the need to duplicate or move data from one source and reducing costs and inefficiencies."

Image credits: Google

Using a set of usage policy tags, BigLake allows administrators to set their own security preferences at the table, row, and column levels. This includes data stored in Google Cloud Storage, as well as the two supported third-party systems, where BigQuery Omni, Google's multi-cloud analytics service, enables these security controls. Those security controls also ensure that only the right data flows into tools like Spark, Presto, Trino, and TensorFlow. The service also integrates with Google Dataplex tool to provide additional data management capabilities.

Google notes that BigLake will provide fine-grained access controls and that its API will span Google Cloud as well as Apache file formats such as parquet and open source processing engines like Apache Spark.

Image credits: Google

"The volume of valuable data that organizations have to manage and analyze is growing at an incredible rate," explain Google Cloud Software Engineer Justin Levandoski and Product Manager Gaurav Saxena. “This data is increasingly distributed across many locations, including data warehouses, data lakes, and NoSQL spaces. As an organization's data becomes more complex and proliferates across disparate data environments, silos emerge, creating increased risk and cost, especially when that data needs to be moved. Our clients have made it clear; they need help."

In addition to Big Lake, Google also announced that Spanner, its globally distributed SQL database, will soon get a new feature called “change streams”. With this, users can easily track any changes to a database in real time, be it inserts, updates or deletes. “This ensures that customers always have access to the most up-to-date data, as they can easily replicate data changes. Spanner to BigQuery for real-time analytics, trigger downstream application behavior via Pub/Sub, or store changes in Google Cloud Storage (GCS) for compliance.” Kazmayer explains.

Google Cloud completes the offer of services with Vertex A.I a tool for managing the entire life cycle of a data science project, out of beta and into general availability, and released Connected Sheets for Looker, as well as the ability to access Looker data models in its tool DataStudio BI.

RELATED

SUBSCRIBE TO TRPLANE.COM

Publish on TRPlane.com

If you have an interesting story about transformation, IT, digital, etc. that can be found on TRPlane.com, please send it to us and we will share it with the entire Community.

MORE PUBLICATIONS

Enable notifications OK No thanks