Data engineers have two ways of moving data from source to destination for data analytics: stream processing and batch processing.

Stream processing is a continuous flow of data from sources such as point-of-sale systems, mobile apps, e-commerce websites, GPS devices, and IoT sensors. In batch processing, by contrast, data is bundled up and processed at regular intervals.

Whether your business needs real-time latency depends on what you need to do with your data. If you're a book retailer checking a dashboard for inventory, you're probably fine with data that's hours old. If you're analyzing data from a heart monitoring implant, you might want no more than a second's latency. If you're doing algorithmic trading in the financial markets, you'll want up-to-the-microsecond pricing information.

Stream processing vs. batch processing

Stream processing handles data in motion — like moving water through a fire hose in a continuous stream. Batch processing is like opening the fire hose every day at midnight and running it until the tank is empty. For example, a day’s worth of data may be batch processed overnight to produce reports the following day.

Stream vs. batch processing: a comparison

Stream processing Batch processing
What Single transaction, record, or set of data points Large datasets composed of multiple transactions or data points
When Continuously processed as data is received from sources Processed periodically — often run automatically based on a set schedule
How Find new data, process the data. Examine the dataset, determine the most up-to-date records to include in the batch
How fast Milliseconds to seconds Minutes to hours
Why Real-time or near real-time interaction with people, sensors, or devices Periodic in-depth analysis or reporting

Stream processing: use cases

Many industries use stream processing to add value to their products and services. Streaming data gives companies real-time, actionable insights.

Finance

Streaming data from ATMs makes it possible for banks to offer consumers continuous access to their bank accounts without human interaction. The ATM can't rely on a nightly batch process; it must know the consumer’s account balance at all times.

Fraud detection is another ATM feature made possible by streaming data. If you use an ATM in Philadelphia, and your ATM card is used five minutes later in Tampa, the bank will decline the Tampa transaction when analysis determines is it fraudulent.

E-commerce

Hyperpersonalization examines a user's real-time website browsing behavior to gain an up-to-the-minute 360-degree customer view. This allows e-commerce retailers to upsell and customize the shopping experience. Another trend is to link the website with apps and physical locations. For example, if a customer views a product on a website, and then walks into a store that sells the product, streaming processing enables the seller to send a coupon to the customer's mobile device at that time.

Sensors/monitors and IoT

Streaming data also appears in businesses as ordinary as laundromats. The Washlava laundry tech platform has turned washing machines into IoT devices to create a better laundromat experience. Customers use an app to reserve a machine and pay for their wash, and the wash cycle status is updated in real time on the customer’s app. That means no more waiting around for your laundry. Of course, this is only possible with streaming data monitoring machine availability and status.

Gaming

In a CIO article, "How big data is disrupting the gaming industry," Dan Schoenbaum, CEO of Cooladata, talks about the importance of data in gaming. "Graphics and creative storylines are no longer enough," he says. "Today’s online game developers should be investing in business intelligence to understand user likes, dislikes, what’s off-putting, when they’re leaving and not returning."

Game developers can get that user information from streaming data during game play. Some game development companies even alter an in-progress game to provide a more satisfying gaming experience and keep players in the game longer.

Streaming data meets the demand for real-time and near real-time responsiveness. But you should consider whether you really need real-time data replication, because it degrades the performance of data warehouses, bogging down data loading and using processing resources that could be spent creating reports. If your goal is to provide people with information they need to make better decisions, it doesn't make sense to update your business intelligence faster than the human brain can process.

Streamline the data ingestion process

Real-time data with webhooks

You may have heard the term "webhooks" or "push API." Webhooks are another way to connect two applications based on events as they happen. When you set up a webhook, a developer creates a URL, and thereafter, whenever a relevant event occurs, the app pushes data to the URL and a connected app can pick up the data. Webhooks fire as discrete events, so they're not the same as stream processing and are not recommended for high-volume applications. But developers use webhooks because they can trigger events from system to system, enabling real-time workflows.

Stitch simplifies data ingestion

If you want to push events as they happen to your data warehouse, you can use Stitch's webhooks implementation. Your data source will notify Stitch as events happen, and Stitch's Input API can ingest the data from the event.

Stitch provides connectors from more than 100 data sources to the most popular data warehouse destinations. The Stitch Incoming Webhooks integration provides a simple and flexible method to integrate webhook APIs with Stitch. Our approach is simple and straightforward, so give Stitch a try, on us — set up a free trial in minutes.

Give Stitch a try, on us

Stitch streams all of your data directly to your analytics warehouse.

Set up in minutesUnlimited data volume during trial