Stitch has added Amazon S3 as an open beta to our list of supported destinations. Amazon S3 is a simple, reliable, and cost-effective object store that provides nearly limitless, highly reliable cloud storage capacity.
Most of the destinations Stitch supported up to now have been traditional data warehouses with standard relational tables. S3, by contrast, supports unstructured data, so it's an excellent choice for a data lake – a repository for all of an organization's data assets. Maintaining both a data warehouse and a data lake allows organizations to pipe the data they know they need to analyze to the former, but retain a wider range of data for possible later analysis.
S3 pricing and performance
Like many other AWS products, S3 is priced on an as-you-go basis – you pay only for the storage you use. Pricing varies by AWS region, but in most regions a gigabyte of storage costs two and a half cents or less per month. For instance, in the US East region, S3 storage costs 2.3 cents per gigabyte ($23 per terabyte) for the first 50TB, and the rate goes down for higher amounts of storage.
In contrast to the hard-drive storage you may be used to, S3 storage is arranged in collections called buckets, which are roughly analogous to directories on hard disk drives. Within those buckets, instead of files, S3 stores objects that may be from zero bytes to 5 terabytes in size.
In general, S3 is cheap, limitless in size, and provides great flexibility for storing data.
Stitch and S3
Currently, Stitch supports CSV and JSON objects as output formats on S3. You can customize how your S3 object's data is arranged by customizing the S3 object key and metadata. If that sounds confusing, Amazon has you covered with a handy Getting Started Guide for S3, and we've documented the process of setting up S3 as a Stitch destination.
There are a wide array of potential uses for data delivered to S3 by Stitch. Using tools like Amazon Athena or Qubole, you can directly query the data where it lives on S3. And because of the file-based nature of the data, you can also directly manipulate and transform the raw data and do whatever you like with the result, whether it be loading into another system or data warehouse, backing it up, or integrating it into other parts of your data analytics stack.
S3 support has been one of our most-requested enhancements, because it provides an exciting option to use Stitch in a new and useful way, and it's available today in open beta.
Let us know what other destinations – and sources – you'd like us to work on next.