Today’s enterprises, from small local businesses to international online retailers who generate a large amount of data, are looking to move to the cloud for increased data availability, better data security, and a smaller financial footprint.

However, when it comes to choosing where to store data in the cloud, there are many choices. One of the simplest yet most powerful options is Amazon Simple Storage Service (S3).

Amazon S3 overview

Amazon Simple Storage Service (S3) is a popular cloud storage service with critical features like encryption, access control, and high availability.

First deployed in the US in 2006 and Europe in 2007, Amazon S3 from AWS services is available in all AWS regions where Amazon Web Services provides cloud solutions, and can be accessed worldwide through a web interface. The service itself is plugged into the wider Amazon Web Services (AWS) data ecosystem.

Using reliable, scalable infrastructure and a key-based object storage architecture, Amazon S3 is well suited to host massive amounts of structured and unstructured data in the form of data lakes. AWS S3 also offers powerful functionality with minimal complexity.

Amazon S3 use cases

Companies like Airbnb, Netflix, Pinterest, and Reddit use S3 to host their web content, images, archives, backups of on-premises data for disaster recovery, and systems of record. Makers of apps can also store the data they collect. Even Amazon itself relies on S3 to store critical project data.

The four main AWS use cases are:

Data lake creation. An S3 data lake enables users to unlock insights to maximize the full value of their data. This is achieved by running applications involving big data analytics, high performance computing (HPC), artificial intelligence (AI), and machine learning (ML).
Critical data backup and restoration. Robust replication features make it easier for organizations to meet Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO) in the event of a disaster. Backup features also support compliance measures.
Low-cost data archiving. Moving data archives to certain levels of AWS S3 service, such as the Glacier storage classes, allow businesses to save money and streamline operations while still keeping data available for generating additional insights.
Operation of cloud-native applications. Developers in particular will enjoy the ability to build robust, speedy mobile and web-based cloud-native apps that are configured to be highly available and scale automatically.

Working with AWS S3

To start using this AWS cloud service, you simply need to sign up for an AWS account. AWS resources, including S3, are managed through a web application known as the AWS Management Console. This customizable home page provides a single place to access everything you need to manage AWS tasks.

AWS S3 buckets

Amazon S3 objects are organized in buckets. Buckets are the main containers in S3, and every object must be stored in one. All of S3’s main features, such as the interfaces and APIs, can act either on buckets or individual objects.

When users upload data, they create bucket and bucket name first, then move however many objects they need into it. AWS S3 uses an Object Key, along with a version identifier, to uniquely identify objects in a bucket. This helps users to organize data.

Organizations may use naming conventions to identify data owners, improve access control, and make the store more navigable for end users.

Amazon S3 users can work with the console or web-service interfaces to access raw objects or buckets.

Capacity and data structures

Amazon sets no cap on the total volume or number of items that can be stored in S3, but individual objects can’t be larger than 5 gigabytes, which is the limit on a single upload. S3 provides tools for uploading large objects in parts and migrating big data into storage.

AWS S3 is a key-value store, one of the major categories of NoSQL databases used for accumulating voluminous, mutating, unstructured, or semistructured data. Uploaded objects are referenced by a unique key, which can be any string. This high-level and generic storage structure affords users near-infinite flexibility.

S3 is capable of storing diverse and generally unstructured data, but it's also suited for hierarchical data and all kinds of structured information. Features such as metadata support, prefixes, and object tags allow users to organize data according to their needs.

APIs and integrations

The REST API for managing Amazon S3 buckets allows developers to connect stored data to other web applications and services. Objects are available to HTTP clients (S3 can be used as a static website host), and URLs can point directly to stored resources.

File-sharing users can employ BitTorrent protocol for downloading data, leveraging peer-to-peer bandwidth savings instead of HTTP GET requests.

Data ingestion and analysis

Developers can query and manage data in S3 using built-in features such as S3 Select. Beyond the ability to run simple SQL on stored objects, users can turn to Amazon Athena for fast, interactive querying, and employ Redshift Spectrum for more complex data retrieval.

S3 integrates with AWS analytics services — no additional data migration or processing is required. Analysts can perform data mining directly on stored objects, metadata, tags, and S3 log information. Amazon also recommends third-party tools such as AWStats, Splunk, and S3Stat for users who wish to manage and analyze their logs with external software.

Storage types and pricing

S3 offers several storage classes for different use cases and expected volumes, including a free tier. The standard storage class guarantees 99.99% availability, but several other options are available:

Standard Infrequent Access is ideal for average data archival, backup, and recovery use cases.
One Zone-Infrequent Access is best for data that is rarely requested but still needs to be quickly retrieved.
Amazon Glacier is optimal for cost-effective, long-term data storage, and has the highest durability with no latency requirements.

S3 pricing scales with usage. Price is typically reduced in regions where Amazon’s infrastructure is less costly to maintain.

Try Stitch for Amazon S3 for free for 14 days

Unlimited data volume during trial
Set up in minutes

Amazon S3 cloud storage benefits

Cloud computing offers several advantages over on-premises systems, including low latency levels and scaling functionality that allows loads to be spread evenly to avoid the impact of traffic spikes on any one application. The S3 Standard storage class, for example, is designed for 99.99% availability, while the S3 Glacier Flexible Retrieval and S3 Glacier Deep Archive classes offer an SLA of 99.9%.

Manageability

Amazon S3’s simple underlying architecture and web service interface make initial deployment and configuration easy.

Management of AWS S3-hosted stores is straightforward yet flexible. From a graphical console, customers can work directly with objects. The platform provides a REST interface that lets developers manage stored information at the account level, within buckets, or within individual objects. S3 also supports batch operations across all levels, and a related service, AWS Lambda, can allow these operations to perform arbitrarily complex tasks.

Reliability and support

S3 is intended to handle a high volume of requests with no interruptions, and also guarantees uptime during traffic spikes. The service level agreement (SLA) makes promises regarding uptime depending on a user’s chosen pricing tier. Users are also insured against AWS site-level failures, and Amazon makes guarantees of durability similar to those for availability.

AWS S3 includes a version control system to protect against unwanted deletions or accidents. Users can also turn on logging, which saves detailed information about interactions with stored data for troubleshooting and repair.

Security and compliance

Amazon S3 keeps stored data secure with the aid of several tools. Built-in support for user policies and bucket policies regulates who and what can download or upload data and prevents unauthorized access.

More sophisticated access policies, as well as several encryption options, are available:

AWS Identity and Access Management (IAM) designates specific users and manages their data clearance.
Access control lists enable object-level granularity when setting permissions.
Security permissions can be set at the bucket level or globally.
Authentication options are available for queries, and both server-side and client-side encryption can be enabled for uploads.
Managers can access audit logs to review data access and activity.
Customers can use Amazon Macie to detect sensitive data in uploads, such as other people's intellectual property (IP) or personally identifiable information (PII).

Finally, Amazon follows all data privacy regulations and states that it tracks data in S3 for pricing purposes only.

Deliver data from 140+ sources to Amazon S3

Using AWS S3 as a data lake

Data lakes store massive (terabytes and even petabytes) amounts of raw, mostly unstructured documents and objects. They can hold copies of all of an enterprise’s business data, whether original and unique or replicated from other functional systems. The data is safely stored and easily accessible for reference or for downstream data analytics.

Because Amazon S3 is designed for securely holding data in any format and built for maximum scalability, it is an ideal data lake hosting platform. In fact, AWS is currently developing a Lake Formation service specifically for creating and managing data lakes in Amazon S3. AWS S3’s features align well with the benefits of setting up and administering a data lake, including:

Flexibility. With AWS S3, you can store relational, hierarchical, semi-structured, or completely unstructured data. There’s no need to transform data to fit into a standardized schema, so stakeholders are free to focus their energies on using the data. In fact, a single S3 bucket can hold different objects in different Amazon S3 storage classes.
Collaborative flow. With a centralized and common data store, issues stemming from rigid business boundaries and opaque silos can disappear. All members of an enterprise can bring their preferred tools to bear on the data relevant to them.
Faster transformation. A data lake creates clear separation between storage and processing. Because transformation takes place downstream with data analysts, complex transformation processes are eliminated from the data pipeline.

In fact, a data lake optimizes data pipelines, enabling a modern approach to ETL. With Amazon S3, it’s easy to replicate data to a safe storage destination where it can be accessed by data analytics tools.

Amazon S3 and Stitch integrate

Even though it's one of the most straightforward cloud object storage services — “simple” is in the name, after all — customers can choose among dozens of options to get data into S3. However, bringing your data into cloud storage is easier than ever using the right tool. With Stitch, you can replicate data from 140+ sources to an S3 instance. Stitch also supports Amazon S3 as a data source, with tools to extract CSV file data from buckets. Best of all, setup takes just a few minutes and incremental updates ensure you're always working with the latest data.

Analysts can build business intelligence processes immediately on top of the data store, while any authorized user can access information for reference or ad-hoc analysis.

To remain competitive, businesses must choose powerful and performant destinations for their data. Sign up for Stitch today for free and leverage your AWS account’s Amazon S3 data lake to its fullest potential.

Give Stitch a try, on us

Stitch streams all of your data directly to your analytics warehouse.

Set up in minutesUnlimited data volume during trial