As businesses become more data-driven, they generally want to use their data for faster decision-making and to improve business processes. Enterprise data contains insights on customer behavior, spending, and revenue. Modern data analysis and business intelligence (BI) involves integrating data from disparate sources, and harnessing it for analysis and BI, usually with the aid of an enterprise data warehouse (EDW).
An EDW is a central repository of data from multiple sources. It gathers enterprise data and makes it available for analysis, BI, and data-driven decision-making. Users (with privileges) across the organization can access and benefit from the data contained there.
EDWs contain current data, such as real-time feeds or the latest snapshots from source systems, as well as historical data. EDWs are the most accessible single version of the truth for enterprise data, because they consistently store final, nonredundant business information in one place. EDWs are also the storage platforms that underlie live analytical processes.
An EDW is a structured and centralized location where users can access business data. Business intelligence is a set of methods and software used by an enterprise to aggregate, summarize, analyze, and ultimately derive value from business data.
Having data from disparate sources, and data that's siloed within individual SaaS platforms or business units, makes it difficult to integrate data across an organization, and to share insights based upon all the data. Data analysis and BI suffer as a result. An EDW solves the problems associated with data silos by bringing all relevant enterprise data into an accessible, available central repository for analysis across the enterprise.
An EDW’s greatest utility comes from the centralization of enterprise data, the increased availability of that data to users in various business units, and improvements to the organization, structure, and automation of data storage and processing.
Try Stitch for free for 14 days
Not long ago, EDWs were on-premises systems with a fixed capacity for processing and storage. They were unable to scale easily or quickly in times of high demand. Today, cloud EDWs have replaced some legacy on-premises systems, and they're a popular choice for organizations adopting new systems. Some advantages of cloud EDWs include:
Let's look at some of the most popular cloud EDWs.
Amazon Redshift has been around since 2013 — longer than any other cloud data warehouse — and boasts the most deployments of any cloud data warehouse provider. Like all cloud data warehouses, it leverages column-oriented storage for fast data access and processing.
Redshift is highly scalable, provisioning clusters of nodes to customers as their storage and computing needs evolve. Each node has individual CPU, RAM, and storage space, facilitating the massive parallel processing (MPP) needed for any big data application, especially the data warehouse.
Redshift uses familiar SQL syntax based on that of PostgreSQL. Redshift integrates with a host of other services, including many BI tools and platforms, from Amazon QuickSight to platforms like IBM Cognos, Periscope Data, and Tableau.
Snowflake's architecture allows customers to scale computing and storage resources separately, unlike Redshift. This makes Snowflake appropriate and cost-effective for many different use cases.
Customers hoping to build a warehouse containing massive but slow data can focus on scaling out storage space. On the other hand, organizations hoping to build sophisticated transformations and processing into their warehouse can focus on increasing computational power.
Snowflake’s automatically managed storage layer can contain structured or semistructured data, such as nested JSON objects. The compute layer is composed of clusters, each of which can access all data but work independently and concurrently to enable automatic scaling, distribution, and rebalancing.
Snowflake is deployed on top of Amazon Web Services or Microsoft Azure hosting infrastructure, and support for Google Cloud Platform is expected soon.
Azure SQL Data Warehouse is Microsoft’s EDW platform. Like other cloud storage and computing platforms, it uses a distributed, MPP architecture. It gathers data from databases (including Azure SQL Database, Microsoft's relational database for the cloud) and SaaS platforms into one powerful, fully-managed centralized repository.
Azure is part of Microsoft’s cloud computing ecosystem, and it provides connectors for a wide range of third-party systems and software. This means customers can take advantage of, and get support for, technologies such as Microsoft SQL Server as a cloud SQL engine, or use a range of other options with which they are familiar.
Microsoft also provides a wider range of pricing options and management tiers than most other cloud EDW providers. Many customers that use Azure for cloud data storage eventually use Azure SQL Data Warehouse to host data for analytics.
Google BigQuery is a web service that allows interactive analysis of massive data. It provides all the functionality needed to build a modern EDW, and can be used alongside Google Cloud Storage and technologies like MapReduce.
BigQuery differentiates itself with a serverless architecture. Users cannot see details of resource allocation, as computational and storage provisioning happens continuously and dynamically.
BigQuery features a transparent pricing plan that scales directly with user needs, billing per query so customers pay for exactly what they use. BigQuery is appropriate for users that need the most ad-hoc system, as it is less tuneable than other cloud EDWs, but requires the least amount of management and allows for greater flexibility.
A critical component to a functioning EDW is the extract, transform, load (ETL) process. ETL consolidates data from multiple sources and transforms it into a useful, modeled, and consistent format for the EDW.
Older on-premises data warehouses run transformations in the data pipeline to avoid taxing limited analytical resources. Cloud-based EDWs use a variant of that process, ELT (extract, load, transform), because cloud platforms have the horsepower to perform necessary transformations on data after replication.
Stitch is a cloud ETL service that provides connectors from more than 90 sources to the most popular data warehouse destinations. We make it easy to extract data data from more than 90 sources and load it to an EDW in a flash. Our approach is simple, straightforward, and ready to go right out of the box. Give Stitch a try, on us — set up a free trial in minutes and get your data into an EDW today.