While there's no single definition of the term "big data," most definitions include a large dataset — terabytes, petabytes, exabytes, or even zettabytes — with at least several thousand discrete components. An organization can mine and analyze the big data to discover patterns or anomalies that lead to insights on which they can base decisions.

Many people don't realize, however, that handling big data isn't just an issue for search engines, media companies, and e-commerce. Nearly every industry generates and collects big data. They analyze it and use it as the basis for business decisions that can improve operations, customer satisfaction, and productivity.

The concept of big data

In 2001, META Group — now Gartner Analytics — analyst Doug Laney formalized the concept of big data in a report that predicted "leading enterprises will increasingly use a centralized data warehouse to define a common business vocabulary that improves internal and external collaboration."

Laney's formulation credited big data with three "V's": velocity, which refers to the speed at which data is processed; volume, which refers to the amount of data in a dataset; and variety, which refers to different types of data in a dataset. Since then, pundits have proposed additional big data V's, include veracity, which refers to the accuracy of a dataset, and value, meaning the ability of a dataset to fulfill a given goal.

In the years since Laney's report corporate IT infrastructures grew, due in part to the widespread adoption of the internet for e-commerce and social media. Organizations generated and processed large volumes of data as part of their ordinary operations, and many businesses realized they could use the data to better understand their own operations and their customers' needs.

Soon, specialized tools for storing and working with big data, such as Hadoop and Spark, appeared, as did new approaches to storing data, such as NoSQL databases and in-memory databases. Today, we see the migration of big data workflows to the cloud, where it's easy and cost-efficient to scale tools and processes as big data gets bigger by the day.

Try Stitch for free for 14 days

  • Unlimited data volume during trial
  • Set up in minutes

Big data use cases

Big data can impact nearly every imaginable business goal. Here are a few specific ways organizations use big data today:

Banking: Analysis of big data helps banks fight fraud by detecting unusual account or payment activity.

Government: Government agencies examine big data to discover patterns. For instance, the IRS uses big data to uncover tax underpayments, while the City of Boston combats potholes.

Health care: An analysis of big data can help doctors and researchers interpret the results of medical interventions or experiments, and help predict patients' risks for certain types of diseases.

Insurance: The price of automobile insurance usually is based upon factors such as the driver's age, location, credit score, claims history, and type of vehicle. But insurers that offer usage-based insurance (UBI) policies can use telematics to access a digital history of a vehicle — including automobile diagnostics and crash avoidance systems — and capture actual driving data via onboard sensors, cameras, and built-in tracking devices. A connected car provides streams of disparate data, including velocity, turns, braking, weather, and road conditions, along with distracted driving information. The big data generated by telematics enables insurers to integrate into business operations data that reflects actual driving behavior. In the future, analysts predict self-driving cars will generate massive amounts of data; according to a Barclay's analyst, a single self-driving car could generate 100GB of data every second.

Law enforcement: Police departments use real-time data and software that integrates, analyzes, and shares otherwise hidden clues from myriad law enforcement data sources in order to anticipate and possibly prevent crimes.

Legislation: Instead of writing laws based on ideas of how people _should _behave, lawmakers and lawyers can analyze big data to help assess how people _actually _behave. For example, datasets from court records can help reveal which aspects of a given law are most frequently broken, or how difficult a certain law is for ordinary citizens to understand. In these ways, big data can facilitate the creation of laws that are more effective and easier to enforce.

Manufacturing: Recent studies indicate that, for many manufacturers, unplanned factory downtime can cost a company as much as $260,000 an hour, so predictive maintenance is critical. IoT-based predictive maintenance employs machine learning algorithms to forecast potential risks and predict when equipment is likely to fail.

Retail: When feedback is anecdotal, it's difficult to extrapolate a single customer's views across all existing or potential customers. But, with the help of big data, companies analyze customer experience systematically by collecting survey responses from thousands of customers, and identify trends within the responses.

6 big data challenges

Big data is powerful, but collecting it, storing it, and leveraging it can be difficult for organizations. They face challenges with big data in several areas:

  • Growing data: As data grows over time, an organization must ensure that its tools and processes can scale as needed. A cloud-based storage system, such as a cloud data warehouse or data lake, is ideal because it offers independently scalable provisions for storage and compute needs.
  • Generating insights: Big data must be mined, and insights communicated, quickly enough for businesses to make impactful decisions. This requires having tools in place to ensure that people can interpret and communicate results effectively.
  • Prioritizing cultural change: According to a Gartner report, chief data officers contend with cultural barriers when building data-driven enterprises, including an information language barrier (data literacy), the need to demonstrate the business value of data, and the need to confront the ethical implications of data and analytics. The report states: "CDOs must consciously address and reinforce the desire, attitude, and behaviors that are needed to become a data-driven organization."
  • Recruiting and retaining big data talent: Companies sometimes struggle to identify and keep data professionals. One recruiting firm predicts that by 2020 2.7 million new job postings will require data science and analytics skills. Organizations must adopt strategies for recruiting data professionals and training current employees.
  • Integrating disparate data: Integrating data from different sources can be challenging, even when an organization uses an ETL tool and a cloud data warehouse as a base for analysis. Businesses must adopt a data strategy that addresses the integration and consolidation of disparate data sources.

  • Ensuring data quality: Large datasets tend to have issues with duplicate, missing, and inaccurate data, which makes it difficult to derive accurate insights. Data professionals can use data quality tools, including data deduplication tools to address duplicates in the data source, and they may use automated tools to minimize the risk of human error when moving data between systems.

ETL data from 100+ sources to your data warehouse

Getting started with big data

Using data analytics and business intelligence (BI) tools with big data has the potential to improve customer experience, increase retention and sales, and optimize back-end processes for managing inventory and labor. Organizations need three basic components to make the most of big data:

  • Data repository: Cloud data warehouses are attractive options for businesses that run analytics against data from multiple sources. They can scale up and down with latency measured in seconds or minutes, so businesses can get the performance they need without the need to provision for peak load, as they used to have to do if they ran on-premises data warehouses.
  • Data integration and replication: Data integration, or data ingestion, refers to the tools and processes that allow organizations to collect data, process it, and make sure it's available in a format that business analysts can use.
  • Data analytics and BI: Once data is integrated and stored, it's ready for employees to use analytics and BI tools to discover insights hidden within.

Stitch can help with the data integration component. It's an easy-to-use ETL tool for replicating data from more than 100 databases and SaaS platforms to cloud data warehouses, centralized and ready for analytics and BI solutions. Take advantage of big data analytics with Stitch today.

Give Stitch a try, on us

Stitch streams all of your data directly to your analytics warehouse.

Set up in minutesUnlimited data volume during trial