We're happy to announce a new destination in the Stitch Data Loader ecosystem: Delta Lake on Databricks. Our partnership with Databricks extends that company’s Data Ingestion Network, a new data source ecosystem that helps Databricks customers leverage the speed and simplicity of Stitch to move data from more than 100 sources into Delta Lake.
Data lakes have historically excelled at holding unstructured data. With a data lake, users don't have to transform data to fit a defined schema, so data pipelines can be simple. However, given the unstructured nature of a data lake, the data stored within often requires processing to make it suitable for analytics.
To facilitate analytics, most companies turn to data warehouses — cloud-based repositories with data defined by schemas. The structured nature of the data they hold makes them well-suited to be a base for data analytics.
Delta Lake is an open source storage layer that brings data reliability and performance to data lakes. It leverages a “lakehouse” paradigm to implement similar data structures and data management features to those in a data warehouse, directly on the kind of low-cost storage used for data lakes. In addition to being a desirable locale for a summer weekend, a lakehouse provides architectural and cost advantages that help accelerate machine learning workloads for users leveraging the kind of unstructured data often found in a data lake.
The lakehouse architecture combines the best elements of data lakes and data warehouses, implementing data management features like transaction support, schema enforcement, and support for use of BI tools directly on the kind of low-cost storage used for data lakes. The result is an affordable, reliable platform with compute resources decoupled from storage for flexible scalability.
Data ingestion for the lakehouse with Stitch
The lakehouse architecture supports the use of business intelligence (BI) tools directly on source data. Combined with broad and diverse data type support, a lakehouse provides access to structured data for BI applications as well as unstructured data suited for machine learning, without the need for users to navigate between systems.
For both use cases, however, accessing and ingesting data can be a serious time suck. Scalability concerns and code maintenance overhead for data ingestion can steal hours from data science teams who could be working on training and deploying their models. With Stitch, Databricks users can easily move data from more than 100 sources directly to Delta Lake and circumvent the costs of pulling together multiple data sources into a single consolidated source of truth.
We’re excited about the opportunity to add value for sales and marketing use cases on the Databricks Unified Analytics Platform. Across thousands of customer proof points, we’ve consistently seen business-driven customers leveraging Stitch together with analytics platforms like Databricks to push the boundaries of customer insights. Using the power of Delta Lake and the Databricks platform, Stitch customers can ingest data from sources like Google Analytics, Google Ads, and more to unlock new insights for their businesses.
Sign up for Stitch Data Loader today to get started or navigate to the Databricks Partner Gallery to get started moving data into Delta Lake.
Image credit: Photo of Delta Lake, Grand Teton National Park, by Randy Johnson