Organizations have choices when it comes to systems on which to base their data analytics stack. Data managers may consider a centralized data warehouse, a group of more specialized data marts, or some combination of the two. Data warehouses and data marts are similar, but they perform different duties, and a business may choose to use one or both for different use cases.
A data warehouse is a repository that stores all of an organization's current and historical data from disparate sources — it's sometimes called a single source of truth. It's a key component of a data analytics architecture that creates an environment for decision support, analytics, business intelligence (BI), and data mining.
A data mart is similar to a data warehouse, but it holds data only for a specific department or line of business, such as sales, finance, or human resources. A data warehouse can feed data to a data mart, or a data mart can feed a data warehouse.
Data warehouses and data marts hold structured data, and they're associated with traditional schemas, which are the ways in which records are described and organized. Whichever repository they choose, businesses use an ETL tool to extract data from various sources and load it into the destination.
Try Stitch for free for 14 days
Two data pioneers — Bill Inmon and Ralph Kimball — hold different philosophies on the organizational architecture and relationship between the two data repositories.
In Inmon's approach — the enterprise data warehouse — a data professional first integrates and centralizes data in a data warehouse before loading it into data marts. This approach makes the data marts a subset of the data in the data warehouse.
Advantages to this approach are: 1) the data warehouse acts as a single source of truth for the entire organization, because it integrates all the organizational data; and 2) when data is first centralized in the data warehouse, it's easier for data managers to enforce structural requirements before the data is distributed to data marts.
Kimball, on the other hand, favors the opposite approach: a dimensional data warehouse that begins with mission-critical data marts that are set up quickly to serve analytic needs of departments or lines of business. In this approach, the data warehouse is a union of the data marts, but there is no single source of truth because data isn't integrated before reporting.
When an organization uses a data warehouse, it doesn't also have to have data marts. In organizations that do use both, most tend toward Inmon's top-down model.
A data warehouse contains data from all parts of a business, which makes it useful for cross-departmental analyses. For example, businesses could create a comprehensive customer profile that reconciles omnichannel retail data, CRM records, marketing campaigns, and social media data. By integrating and modeling this data, data analytics experts can empower employees in all departments to make strategic decisions about how to interact with customers.
A data mart, on the other hand, contains data from a few sources with information specific to a business line or department. If a manufacturing manager wants to analyze production delays, she can go to her data mart, query the data, and run reports to determine where faults lie in the production line. She can extract and analyze it quickly because of the limited scope and size of the data.
Data warehouses and data marts address distinct use cases, and there are major differences in the ways in which they're built and used, in the types of decisions they enable, and in the ways they're priced and implemented.
|Data warehouse||Data mart|
|Objective||Centralize data, become single source of truth across business||Provide easy access to data for a department or specific line of business|
|Uses||Business-wide analysis||Department-specific analysis|
|Decision types||Strategic decision-making||Operational or tactical decision-making|
|Scope||Wide; contains data from all departments and lines of business||Specific; individual data marts for individual departments|
|Size||Typically more than 100GB||Less than 100GB|
|Data held||All organizational data||Single business line|
|Data sources||Dozens or hundreds||Typically just a few|
|Time to implement||Months to years (on-premises); days to weeks (cloud-based)||Weeks to months (on-premises); days to weeks (cloud-based)|
|Cost||$100K+ (on-premises); on-demand pricing varies (SaaS)||$10K (on-premises); on-demand pricing varies (SaaS)|
Data warehouses can address high-level business decisions. They store current and historical data from dozens or hundreds of disparate sources, making them a single source of truth for a data-driven organization.
Data marts are great for tactical, department-specific analyses, they're easy to use, design, and implement, and they are department-specific. Each department that requires these types of analytic capabilities needs its own data mart.
Years ago, setting up a data warehouse was an expensive, labor-intensive process that could take months. Data warehouses ran on expensive hardware servers architected to provide high performance for analytics tasks. At that time, a data mart was easier and more cost-effective to set up if a department needed to get insights from its data.
Today, nearly all organizations opt for a cloud data warehouse, which is scalable and cost-effective. In fact, a cloud-data warehouse can be implemented so quickly — within hours or days — that it's just as easy to set up a data warehouse as it is to set up a data mart. Once a cloud data warehouse is up and running, employees can create data marts — as a subset of the data warehouse — as needed. And cloud data warehouses provide fast and elastic scaling of resources, allowing businesses to scale up resources for periodic or seasonal processing and scale them down again when they're not utilizing them.
If you choose to work with a cloud data warehouse, you need a way to populate it with the data in your existing databases and SaaS tools. That's where Stitch comes in.