Enterprises from small local businesses to international online retailers are looking to move to the cloud for increased data availability, better data security, and a smaller financial footprint. Amazon Simple Storage Service (S3) is a popular cloud storage service with critical features like encryption, access control, and high availability. Customers access S3 through a web interface, and the service itself is plugged into the wider Amazon Web Services (AWS) data ecosystem.
Even though it's one of the most straightforward cloud object storage services — simple is in the name, after all — customers can choose among dozens of options to get data into S3. Some tools only migrate data, while others allow users to copy or replicate data using processes such as ETL (extract, transform, load). Effectively transferring data to S3 requires knowing the available data transfer services, how they work, and their use cases.
When transferring data to S3, enterprises should avoid interrupting the systems that generate and update the data.
The first step is choosing the service that will perform the transfer. AWS provides options to help businesses transfer their datasets, and third-party options are available as well. When deciding how to transfer data, businesses should take into account data size and considerations such as backup level — whether backups contain all data, snapshots, or only changes, and how often they are saved — data availability, and whether transfers are one-time or recurring.
For example, anyone could move a database with just a few tables and a moderate number of records using built-in tools. Recurring and real-time transfers, like moving transactional data on an ongoing basis, should make use of online services. However, a time-sensitive transfer of a massive customer database that includes personally identifiable information and other sensitive data may require the use of offline services that involve using physical storage devices to move data. Large one-time transfers, like uploading a massive transaction history file, are also more easily managed with offline services.
Both AWS and third parties offer online services for transferring data to S3. These options are useful for simple transfers, large lift-and-shift migration projects, linking on-premises systems to S3 for continuous archive or backup, and hybrid cloud deployments.
AWS offers several online transfer services, ranging from a simple command line interface to options that support ongoing data migration and replication.
The S3 Command Line Interface (S3 CLI) provides the most direct way for users to manage their S3 instances. It includes simple Unix shell-like commands to copy, move, filter, and sync data and directories across S3 deployments and servers.
The S3 CLI is appropriate for small, simple tasks that don't require the automated assistance other options provide. Individuals can also use it to make minor adjustments to existing deployments, as long as these changes are noninvasive and performed by users with command line expertise.
AWS Direct Connect allows customers to transfer data to a cloud location such as S3. It creates connections between on-premises data sources and Amazon's networks, moving data along a dedicated pathway that avoids restrictions imposed by internet providers and web traffic. Customers can host components of both their public and private networks through the service.
While the interfaces are highly configurable, Direct Connect is just one step beyond the S3 CLI in terms of automated management and overall simplicity. This method is appropriate for larger data transfers that are too complex to handle with the S3 CLI.
AWS DataSync can transfer data between the same systems as Direct Connect, but it offers more management and automation options. Enterprises can use DataSync for both data migration and replication projects. The service monitors transfer processes, automates scheduling, and optimizes networks to maximize transfer speeds or reduce bandwidth. This additional automation reduces operational costs during large or complex transfers.
AWS Storage and Partner gateways allow customers to use their Amazon cloud storage as if it was just another piece of their on-premises deployment. An administrator sets up a gateway so both local applications and users can access scalable cloud storage through the same APIs and interfaces that they're accustomed to. Using traditional access methods while keeping data in the cloud conserves on-premises resources.
Amazon S3 Transfer Acceleration is an alternative to Amazon's standard transfer protocols that is designed to speed S3 data transfers across long physical distances. It's most appropriate for customers that need to upload data to a centralized S3 deployment on an ongoing basis from many data silos or varied geographic locations. For example, websites that involve recurring media uploads from around the world or data warehouses regularly updated by a diverse set of clients would likely benefit from this service.
Amazon Kinesis Firehose is a data migration tool for streaming data to Amazon cloud storage like S3. Firehose includes many of the same features as other automatically scaled and managed AWS services, requiring little customer set up, configuration, or monitoring. Only businesses that require data in real time need consider Firehose.
AWS Glue is a complete ETL solution that's integrated with S3. Glue lets customers run ETL jobs while automatically detecting data formats, schemas, and potential transformations or adjustments to perform. AWS Glue is appropriate for customers who need an ETL solution that works with S3 and many other AWS services.
AWS Data Migration Service (DMS) specializes in database migrations. Though it has some NoSQL support, it's primarily focused on migrating large relational database deployments with minimal disruption. This service is best for customers who want to move existing relational database systems and data warehouses to the cloud with minimal fuss or change to schemas.
Finally, Amazon offers a managed data pipeline service, AWS Data Pipeline, that allows customers to create entire data processing workflows connected to a variety of other AWS offerings, including S3, or external systems, such as the most popular relational databases.
Beyond the first-party services offered by Amazon and AWS itself, many third-party tools and services exist for planning seamless data transfers. Customers may use utilities such as rsync and rclone, which are unmanaged and require expertise to wield effectively. These tools can copy data directly into S3 buckets and provide granular control over the migration process. Many robust third-party data integration and transfer services are also available for handling more complex transfers. For example, Stitch offers tools designed to replicate data to S3 in minutes, while also supporting a vast collection of data sources and destinations beyond AWS products.
For massive amounts of data, or for data with particular security requirements, offline services — moving data through a physical storage intermediary instead of a network — may be more appropriate. This intermediary may be a device the size of a briefcase or an entire vehicle devoted to carrying hard drives from customers to S3 data centers.
Transferring data is rarely straightforward or easy, even for an established platform like S3. Stitch offers a simple data pipeline for replicating data from more than 100 sources to S3 and many other destinations as well. Try Stitch for free today.