What is Snowflake?

Snowflake is a cloud-native platform that eliminates the need for separate data warehouses, data lakes, and data marts, allowing secure data sharing across the organization. Its platform sits on public clouds and allows organizations to easily unify and connect to a single copy of all their data.

In 2020, Snowflake unveiled the Snowflake Data Cloud as the next iteration of their journey to help organizations simplify and leverage their data management. It creates an ecosystem of businesses and organizations that can share and consume shared data and data services.

Snowflake: Causing an avalanche of data!

Many data workloads. One data platform. Snowflake is built for the cloud from the ground up. It delivers the flexibility and efficiency that simply isn’t possible with a traditional approach.

What is Snowflake Data Cloud?

The Snowflake Data Cloud uses technology to solve common data challenges for businesses, such as access, availability, and performance. It serves to democratize data and break down data silos to improve business performance.

Snowflake is built on top of the Amazon Web Services, Microsoft Azure, and Google Cloud infrastructure. There's no hardware or software to select, install, configure, or manage, so it's ideal for organizations that don't want to dedicate resources for setup, maintenance, and support of in-house servers. And data can be moved easily into Snowflake using an ETL solution like Stitch.

But what sets Snowflake apart is its architecture and data sharing capabilities. The Snowflake architecture allows storage and compute to scale independently, so customers can use and pay for storage and computation separately. The sharing functionality makes it easy for organizations to quickly share governed and secure data in real time.

The role of Snowflake in data warehousing and data lakes

The Snowflake Data Cloud supports multiple data workloads, including data warehouses, data lakes, data engineering, data science, and data applications across cloud providers. Its architecture delivers real-time, near-unlimited storage and computing to concurrent users.

Try Stitch for Snowflake for free for 14 days

  • Unlimited data volume during trial
  • Set up in minutes

Snowflake architecture: A unique approach to data storage and processing

Snowflake architecture consists of three layers, each of which is independently scalable: storage, compute, and cloud services. Its architecture allows for flexibility with big data.

Database storage: decoupling storage and compute resources

Snowflake decouples the storage and compute functions, which means organizations that have high storage demands but less need for CPU cycles — or vice versa — don't have to pay for an integrated bundle that requires them to pay for both. Users can scale up or down as needed and pay for only the resources they use. Storage is billed by terabytes stored per month, and computation is billed on a per-second basis.

The database storage layer holds all data loaded into Snowflake, including structured and semi-structured data. Snowflake automatically manages all aspects of how the data is stored: organization, file size, structure, compression, metadata, and statistics. This storage layer runs independently of compute resources.

Compute layer: virtual warehouses and scalability

Snowflake’s compute layer is made up of virtual warehouses that execute data processing tasks required for queries. Each virtual warehouse (or cluster) can access all the data in the storage layer, then work separately, so the warehouses do not share — or compete for — compute resources. This enables nondisruptive, automatic scaling, which means that while queries are running, compute resources can scale without the need to redistribute or rebalance the data in the storage layer.

A logo for Snowflake Data Warehouse

Cloud services: metadata management, optimization, and automation

Finally, Snowflake’s cloud services layer uses ANSI SQL and coordinates the entire system. It eliminates the need for manual data warehouse management and tuning. Services in this layer include:

  • Authentication
  • Infrastructure management
  • Metadata management
  • Query parsing and optimization
  • Access control

Deliver data from 140+ sources to Snowflake

Five key benefits of Snowflake for your business

Snowflake is built specifically for the cloud, and it's designed to address many of the problems found in older, hardware-based data warehouses, such as limited scalability, data transformation issues, and delays or failures due to high query volumes. Here are five ways Snowflake can benefit your business:

1. High performance and speed

The elastic nature of the cloud means if you want to load data faster, or run a high volume of queries, you can scale up your virtual warehouse to take advantage of extra compute resources. Afterward, you can scale down the virtual warehouse and pay for only the time you used.

2. Flexible storage: supporting structured and semi-structured data

You can combine structured and semi-structured data for analysis and load it into the cloud database without the need for conversion or transformation into a fixed relational schema first. Snowflake automatically optimizes how the data is stored and queried.

3. Concurrency and accessibility for real-time data applications with a multi-cluster architecture

With a traditional data warehouse and a large number of users or use cases, you could experience concurrency issues (such as delays or failures) when too many queries compete for resources.

Snowflake addresses concurrency issues with its unique multi-cluster architecture: Queries from one virtual warehouse never affect the queries from another, and each virtual warehouse can scale up or down as required. Data analysts, engineers, and scientists can get what they need, when they need it, without waiting for other loading and processing tasks to complete.

4. Seamless data sharing and integration across the ecosystem

Snowflake's architecture enables data sharing among Snowflake Data Cloud users. It also allows organizations to seamlessly share data with any data consumer — whether they are a Snowflake customer or not — through reader accounts that can be created directly from the user interface. This functionality allows the provider to create and manage a Snowflake account for a consumer.

5. Advanced availability and security in the cloud

Snowflake is distributed across availability zones of the platform on which it runs — either AWS, Google Cloud, or Azure — and is designed to operate continuously and tolerate component and network failures with minimal impact to customers. It is SOC 2 Type II certified, and additional levels of security — such as support for PHI data for HIPAA customers, and encryption across all network communications — are available.

Harnessing the Snowflake Data Cloud for data science and engineering

The Snowflake Data Cloud is ideal for data science, data engineering, and data analytics teams as they source and share data for business intelligence, product development, and other business decision making. It’s easy to use and supports citizen users in several ways:

Snowflake's SQL and API support for Python, Java, and other languages

Snowflake uses SQL and features APIs for Python, Java, and other programming languages. It is versatile and can connect to leading applications and systems to support data management across all industries. Always working to be more inclusive and useful to a wider audience, Snowflake has also created a new developer experience, Snowpark.

Leveraging Snowpark for machine learning and advanced analytics

Snowpark is a developer experience that enables developers to write code in their preferred language and run their code directly on Snowflake. This exposes interfaces in Python, Scala, or Java to supplement Snowflake’s original SQL interface and to support a wider diversity of developers in building the applications and solutions they need. Snowpark is often seen as a machine learning and data science framework that offers the power of SQL within a Python flexibility; it can be used to train machine learning models.

The Snowflake Marketplace and data exchange for rich data services

Snowflake offers a Snowflake Marketplace, powered by Snowflake Data Sharing, that enables organizations to securely offer, discover, consume, and share live, governed data and data services at scale while eliminating the cost and latency often associated with traditional marketplaces. Data can be shared among business units, departments, as well as internally and externally with partners and customers. Snowflake customers can access datasets from Zillow, Weather Source, Epsilon, FactSet, and Safegraph, among numerous other major SaaS providers.

Connecting your data ecosystem to Snowflake with Stitch

To load data into a Snowflake data repository, companies often use an extract, transform, load (ETL) process. Having the right ETL tool can make this process easy and more efficient. Stitch is a simple, powerful ETL service built for developers. It makes it easy to connect your ecosystem of data into Snowflake, by connecting to your first-party data sources and replicating that data to your data repository. Using Stitch to extract and load data makes migration simple, and users can run transformations on data stored within Snowflake.

As a Snowflake Partner, we make it easy to connect with Stitch from the Snowflake Partner Connect Portal. New users get a free 14-day trial, during which you can move an unlimited amount of data from more than 140 data sources, including popular platforms such as Google Analytics and Google Ads, Shopify, Salesforce, and Stripe.

Get started with your free, 14-day Stitch trial today.

Give Stitch a try, on us

Stitch streams all of your data directly to your analytics warehouse.

Set up in minutesUnlimited data volume during trial