Jump to content

data lakehouse

From Wiktionary, the free dictionary

English

[edit]

Etymology

[edit]

Blend of data lake +‎ data warehouse. Coined by American company Databricks in 2020 in a corporate whitepaper.[1][2]

Noun

[edit]

data lakehouse (plural data lakehouses)

  1. (databases) A hybrid architecture combining data lake and data warehouse principles.
    • 2023, Greg Beaumont, Power BI Machine Learning and OpenAI [] , Packt Publishing Ltd, →ISBN, page 27:
      In examples of data lakes or data lakehouse architectures, you will often see bronze/silver/gold or raw, curated, optimized layers, which serve as both reference points within the transformation process that happens to data and for data that might have been referred to as staging tables in older data warehouse terminology.

References

[edit]
  1. ^ Michael Armbrust, Ali Ghodsi, Reynold Xin, Matei Zaharia (2020 December 20) “Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics”, in Databricks – Research at Databricks and MosaicML[1], Databricks, archived from the original on 2023-02-23:
    In this paper, we discuss the following technical question: is it possible to turn data lakes based on standard open data formats, such as Parquet and ORC, into high-performance systems that can provide both the performance and management features of data warehouses and fast, direct I/O from advanced analytics workloads? We argue that this type of system design, which we refer to as a Lakehouse (Fig. 1), is both feasible and is already showing evidence of success, in various forms, in the industry.
  2. ^ Denise Schlesinger (2023 April 23) “Part 1 - Building a Data Lakehouse using Azure Data Explorer”, in Microsoft – Startups at Microsoft[2], Microsoft, archived from the original on 2023-12-07:
    What is a Data Lakehouse? The Data Lakehouse term was coined by Databricks on an article in 2021 and it describes an open data management architecture that combines the flexibility, cost-efficiency, and scale of data lakes with the data management, data mutability and performance of data warehouses.