Modern enterprises store and process diverse sets of big data, and they can use that data in different ways, thanks to tools like databases and data warehouses. Databases efficiently store transactional data, making it available to end users and other systems. Data warehouses aggregate data from databases and other sources to create a unified repository that can serve as the basis for sophisticated reporting and analytics.
A database is software that stores a collection of data under a set of consistent rules. Users interact with this data and its governing constraints using a database management system (DBMS) and an associated query language, the most common of which is Structured Query Language (SQL).
Databases are often relational, arranging records in tables with predefined rows and columns, with rules concerning everything from relationships between individual tables to the types and formats of contained values. Data in databases is stored and retrieved by record or row, where each row represents a single event — a transaction for a customer, for instance. Relational databases are suitable for storing transactional data where records are frequently read, inserted, updated, and deleted.
Non-relational databases (collectively referred to as NoSQL databases) are becoming an alternative to the older relational models. NoSQL databases are a good choice for storing large amounts of unstructured data, among other uses.
Databases can be deployed on premises, completely in the cloud, or in a hybrid configuration that involves both. The largest databases now run on massively distributed networks, while lighter databases run on cell phones, simple DIY hardware, and compact IoT devices.
Databases serve a number of different use cases within a modern enterprise. They store and process transactions for operations, logistics, administration, and even content management systems. Different databases can serve the needs of a small independent bookstore to track inventory and purchases, or a multinational travel agency that provides an online flight reservation system.
Databases are primarily associated with transactional systems, which require fast, fault-tolerant processing of mission-critical data. Online transaction processing (OLTP) describes a type of system optimized for dealing with numerous simple transactions. Banks use databases for OLTP in customer-facing applications, because high latency for financial transactions is unacceptable, and mistakes are disastrous.
Try Stitch for free for 14 days
Data warehouses have a lot in common with databases. A data warehouse is a central, integrated repository for both historical and current data, gathered from various internal and external sources.
Data warehouses often include data from multiple individual databases and other disparate sources. They aggregate records into a system intended as a complete, updated storehouse for an organization’s transactional and informational data. They allow for more complex historical queries than in the individual component data stores and sources. In data warehouses, online analytical processing (OLAP) focuses on resolving such queries efficiently.
While effective data warehouse management requires a general understanding of database concepts, it also requires understanding the warehouse’s distinct architectural paradigms and particular utility.
The architecture of a modern data warehouses has three “layers”: storage, compute, and client services. The storage layer holds all data loaded into the data warehouse. Data is extracted from a source (or multiple sources) and loaded into the data warehouse using an ETL tool. The compute layer executes data processing tasks required for queries. The client services layer may be a combination of tools that allow users to connect with and get data out of the data warehouse. It may include query, reporting, analysis, and business intelligence tools.
Data warehouses are a tool for data analysis and reporting. Through data mining and other analytical techniques they allow analysts to synthesize information and insights that would be difficult to glean from individual data sources.
Companies selling products or services can use a data warehouse for market research by analyzing the transactional data from one application in combination with information from multiple, disparate sources. Many enterprises also use their data warehouse for forecasting, as the integrated view they provide yields improved financial reporting and guidance for future budgeting.
Small, simpler data warehouses that cover a specific business area are called data marts.
Sometimes multiple data marts are fed by one master data warehouse, and each mart is built and owned by an individual department, such as operations or sales. In other businesses, individual data marts feed into an organizational master data warehouse.
Because a data mart’s scope is usually a single department and covers less ground, it is quicker and easier to implement than an enterprise data warehouse.
Overall, databases house day-to-day operational data, while data warehouses aggregate and analyze data. Individual databases often directly connect to production systems and user-facing applications, while data warehouses are internal tools for managers and stakeholders.
Databases sustain an enterprise’s day-to-day transactional systems. From processing a customer’s ATM withdrawal to logging the books borrowed by a library user, databases are best suited for the mundane but foundational elements of a business.
Meanwhile, data warehouses sustain business intelligence and analytics. A data warehouse can provide market research for product development, enterprise-level reporting for managers to accurately gauge performance, or data mining for online businesses’ recommendation systems.
Data must be extracted from its source, transformed into a useful format for analytics, and loaded into a warehouse where those analytics take place — a process called ETL. In an alternative approach (“ELT”), data engineers extract and load the raw data into the data warehouse, and data scientists and business users can transform it as needed. Either kind of data integration can connect databases to data marts and data warehouses for accurate, timely business intelligence.
Stitch can replicate data from a broad spectrum of databases, including MySQL, Oracle, PostgreSQL, and MongoDB, as well as from SaaS sources. It supports all of the most popular cloud data warehouse destinations, including Amazon Redshift, Google BigQuery, Snowflake, and Microsoft Azure SQL Data Warehouse.
If your business is ready to integrate its databases into a data warehouse, sign up for Stitch for free and begin optimizing a powerful, fast, available, and centralized data pipeline.