HomeTechnologyBig DataGoogle Cloud Launches BigLake: New Cross-Platform Data Storage Engine

Google Cloud Launches BigLake: New Cross-Platform Data Storage Engine

At his Cloud Data Summit, Google introduced BigLake, a new data lake storage engine that makes it easier for companies to analyze the data in their data warehouses and data lakes.

The idea, in essence, is to take the experience of Google with running and managing your BigQuery data warehouse and extending it to data lakes in Google Cloud Storage, combining the best of data lakes and warehouses into a single service that abstracts the underlying storage, formats, and systems.

It's worth noting that this data could be in BigQuery or AWS S3, Azure or Gen2. Through BigLake, developers will gain access to a consistent storage engine and the ability to query underlying data stores through a single system without moving or duplicating data.

“Managing data across disparate lakes and warehouses creates silos and increases risk and cost, especially when data needs to be moved,” said Gerrit Kazmaier, vice president and general manager of databases, data analytics and business intelligence at Google. Cloud. "BigLake enables enterprises to unify their data lakes and warehouses to analyze data without worrying about the underlying storage system or format, eliminating the need to duplicate or move data from one source and reducing costs and inefficiencies."

Image credits: Google

Using a set of usage policy tags, BigLake allows administrators to set their own security preferences at the table, row, and column levels. This includes data stored in Google Cloud Storage, as well as the two supported third-party systems, where BigQuery Omni, Google's multi-cloud analytics service, enables these security controls. Those security controls also ensure that only the right data flows into tools like Spark, Presto, Trino, and TensorFlow. The service also integrates with Google Dataplex tool to provide additional data management capabilities.

Google notes that BigLake will provide fine-grained access controls and that its API will span Google Cloud as well as Apache file formats such as parquet and open source processing engines like Apache Spark.

Image credits: Google

"The volume of valuable data that organizations have to manage and analyze is growing at an incredible rate," explain Google Cloud Software Engineer Justin Levandoski and Product Manager Gaurav Saxena. “This data is increasingly distributed across many locations, including data warehouses, data lakes, and NoSQL spaces. As an organization's data becomes more complex and proliferates across disparate data environments, silos emerge, creating increased risk and cost, especially when that data needs to be moved. Our clients have made it clear; they need help."

In addition to Big Lake, Google also announced that Spanner, its globally distributed SQL database, will soon get a new feature called “change streams”. With this, users can easily track any changes to a database in real time, be it inserts, updates or deletes. “This ensures that customers always have access to the most up-to-date data, as they can easily replicate data changes. Spanner to BigQuery for real-time analytics, trigger downstream application behavior via Pub/Sub, or store changes in Google Cloud Storage (GCS) for compliance.” Kazmayer explains.

Google Cloud completes the offer of services with Vertex A.I a tool for managing the entire life cycle of a data science project, out of beta and into general availability, and released Connected Sheets for Looker, as well as the ability to access Looker data models in its tool DataStudio BI.

next >>

In the metaverse, responsible AI must be a priority

Adobe is also working on generative video

Investors are increasingly wary of AI

Meta presents its new custom AI chip

TTC: US and EU establish links for AI security and risks

Building a strong startup development culture requires constant adjustment

Goody-2, AI too ethical to discuss anything

DEI: latest legal and corporate challenges

Key AI policies: Unlock your potential and protect from risks at work

It's never too late to start

TikTok now allows creators in more countries to earn money from their effects

The creative economy is ready for a labor movement

Pay attention to the hidden costs of AI to avoid ruining innovation

Cambio puts artificial intelligence robots on the phone to negotiate debts and talk to bank customers

Time to put subscription economics and its value to customers to the test

Fintech funding slows to lowest level since 2017

AirMyne harnesses geothermal energy to directly capture carbon from the air

Astranis presents Omega 'MicroGEO' satellites to transmit dedicated broadband from high orbit

'Banking as a Service' Startup Griffin Gets Full Banking License

AirMyne harnesses geothermal energy to directly capture carbon from the air

Apple acquires AI startup to oversee manufacturing components

Meta presents its new custom AI chip

Astranis presents Omega 'MicroGEO' satellites to transmit dedicated broadband from high orbit

Enterprise SaaS Investment Returns, But Not Where You'd Expect

The chronology you need to know about the AI Chatbot

AI: summary of main concepts

How to present a Startup to Investors

OKR Model

Creation of a Strategic Plan

Google Cloud Launches BigLake: New Cross-Platform Data Storage Engine

It is time to break the intermediation of Big Tech regarding data

Dozer helps build real-time data applications 'in minutes'

Everstream: big data in supply chain management

SUBSCRIBE TO TRPLANE.COM

Publish on TRPlane.com

MORE PUBLICATIONS

Collective helps freelancers work as a team

Instant payments startup Ivy raises $7,7 million in seed round

Shield, financial communications platform, raises $20 million

Andreessen Horowitz invests in Setpoint, the 'Stripe for credit'

New data indicates a slowdown in eCommerce