MongoDB snapshot

Release status Released Supported by Stitch
Availability Paid Supported Versions 2.4 through 3.4
SSL connections Supported VPN Connections Unsupported
Data selection Tables only View Replication Unsupported
Destination
incompatibilities
Possible incompatibilities. Learn more.

Connecting MongoDB

MongoDB Setup Requirements

To set up MongoDB in Stitch, you need:

  • A paid Stitch plan. While those currently in the Free Trial will also be able to set up MongoDB, replication will be paused until a paid plan is selected after the trial ends.
  • Permissions in MongoDB that allow you to create/manage users. This is required to create the Stitch database user.

  • A MongoDB server that uses Auth mode. Auth mode requires every user who connects to Mongo to have a username and password. These credentials must be validated before the user will be granted access to the database.

  • To be using MongoDB version 2.4 through 3.4. While older versions may be connected to Stitch, we may not be able to provide support for issues that arise due to unsupported versions.

    We recommend always keeping your version current as a best-practice. If you encounter connection issues or other unexpected behavior, verify that your MongoDB version is one supported by Stitch.

Additionally, note that:

  • If using SSL, your server must require SSL connections. Note that SSL is not required to connect a MongoDB database to Stitch.
  • If connecting via Atlas, Stitch can only connect to instances using a paid Atlas plan. The Free Atlas plan utilizes a setup that Stitch doesn’t currently support.

Step 1: Index Replication Key Fields

Before you jump into the actual setup, consider how the documents in your Mongo database are updated.

Our Mongo integration uses Incremental Replication to replicate Mongo data, which means that only new and updated data will be replicated to your data warehouse when a sync runs. Stitch uses a field you designate - called a Replication Key - to identify new and updated data.

There are two requirements for Mongo Replication Keys:

  1. The field must be indexed. Only indexed fields will display in the Replication Key drop-down.
  2. The field must exist in the root of the document.

Additionally, while this is not a strict requirement, Replication Key fields should only contain a single, auto-incrementing data type. If a field contains multiple data types or a data type that doesn’t auto-increment, Stitch may have issues with detecting new/updated data.

For a detailed look at Mongo Replication Keys, check out the Selecting & Changing Mongo Replication Keys guide before continuing.


Step 2: Whitelist Stitch's IP addresses

For the connection to be successful, you’ll need to configure your firewall to allow access from our IP addresses. Whitelist the following IPs before continuing onto the next step:

  • 52.23.137.21/32

  • 52.204.223.208/32

  • 52.204.228.32/32

  • 52.204.230.227/32


Step 3: Retrieve your Stitch Public Key

The Stitch Public Key

The Public Key is used to authorize the Stitch Linux user. If the key isn’t properly installed, Stitch will be unable to access your database.

To retrieve the key:

  1. Sign into your Stitch account.

  2. On the Stitch Dashboard page, click the Add Integration button.

  3. Click the MongoDB icon.

  4. When the credentials page displays, click the Encryption Type menu and select the SSH Tunnel option.

  5. The Public Key will display, along with the other SSH fields.

Leave this page open for now - you’ll need it to wrap things up at the end.


Step 4: Create a Stitch Linux user

Note: Anything inside square brackets - [like this] - is something you need to define when running the commands yourself.

  1. To create the new user, run the following commands as root on your Linux server:

    adduser --disabled-password [stitch_username]
    mkdir /home/[stitch_username]/.ssh
    
  2. Next, import the Public Key into authorized_keys. This will ensure the Stitch user has access to the database.

    Copy the entire key into the authorized_keys file by:

    "[PASTE KEY HERE]" >> /home/[stitch_username]/.ssh/authorized_keys
    
  3. Alter the permissions on the /home/[stitch_username] directory to allow access via SSH:

    chown -R [stitch_username]:[stitch_username] /home/[stitch_username]
    chmod -R 700 /home/[stitch_username]/.ssh
    

Step 5: Create a Stitch database user

To successfully connect and replicate your Mongo data, Stitch requires the ability to:

  • Run the listDatabases command. This permission is required so Stitch can detect the databases available for replication.
  • Run the listIndexes command. Because Stitch will only display indexed fields as Replication Key options, this permission is required to identify fields that can be used as Replication Keys.
  • COUNT and query on all the databases you want to replicate data from. These permissions are requird to replicate your data.
  • Run the dbVersion command. While this isn’t mandatory, it’s beneficial for Stitch to have access to the information this command yields to troubleshoot any connection or replication issues that may arise.

You can assign a role to the Stitch user if you like, as long as the role has the necessary permissions to perform the actions listed above.

When connecting to multiple databases, you can add the user by logging into Mongo as an admin user and running the following command. This example uses createUser, but older versions may use addUser. Documentation for addUser can be found here.

Replace [authentication_database] with the name of database where the user is authenticated, or created:

use [authentication_database]
db.createUser( {  user: "[stitch_username]",
                  pwd: "[secure password here]",
                  roles: ["roles here", "if you want them"]
               }
             )

Note: For Atlas-based instances, the authentication_database will be admin.


Step 6: Connect Stitch

  1. Sign into your Stitch account, if you haven’t already.
  2. On the Stitch Dashboard page, click the Add Integration button.
  3. Click the MongoDB icon.
  4. Fill in the fields as follows:

    • Integration Name: Enter a name for the integration. This is the name that will display on the Stitch Dashboard for the integration; it’ll also be used to create the schema in your data warehouse.

      For example, the name “Stitch MongoDB” would create a schema called stitch_mongodb in the data warehouse. Note: The schema name cannot be changed after the integration is saved.

    • Host (Endpoint): Enter the host address (endpoint) used by the MongoDB instance.

      In general, this will be 127.0.0.1 (localhost), but could also be some other network address (ex: 192.68.0.1) or your server’s public IP address. Note: This must be the actual address - entering localhost into this field will cause connection issues.

    • Port: Enter the port used by the MongoDB instance. The default is 27017.

    • Username: Enter the Stitch MongoDB database user’s username.

    • Password: Enter the password for the Stitch database user.

    • Database: Enter the name of the MongoDB database where the Stitch user is to be authenticated. Stitch will ‘find’ all the databases you gave the Stitch user access to - this is needed only to complete the connection.

      Note: If you’re connecting an Atlas-based MongoDB instance, this must be the admin database. See the Create a Mongo database user section for more info on this requirement.

Enter SSH connection details

If you’re using an SSH tunnel to connect your MongoDB database to Stitch, you’ll also need to complete the following:

  1. Click the Encryption Type menu.
  2. Select SSH Tunnel to display the SSH fields.

  3. Fill in the fields as follows:

    • Remote Address: Enter the IP address or hostname of the server Stitch will SSH into.

    • SSH Port: Enter the SSH port on your server. (22 by default)

    • SSH User: Enter the Stitch Linux (SSH) user’s username.

In addition, click the Connect using SSL checkbox if you’re using an SSL connection. Note: The database must support and allow SSL connections for this setting to work correctly.


Step 7: Create a replication schedule

In the Replication Frequency section, you’ll create the integration’s replication schedule. An integration’s replication schedule determines how often Stitch runs a replication job, and the time that job begins.

Stitch offers two methods of creating a replication schedule:

  • Replication Frequency: This method requires selecting the interval you want replication to run for the integration. Start times of replication jobs are based on the start time and duration of the previous job. Refer to the Replication Frequency documentation for more information and examples.
  • Anchor scheduling: Based on the Replication Frequency, or interval, you select, this method “anchors” the start times of this integration’s replication jobs to a time you select to create a predictable schedule. Anchor scheduling is a combination of the Anchor Time and Replication Frequency settings, which must both be defined to use this method. Additionally, note that:

    • A Replication Frequency of at least one hour is required to use anchor scheduling.
    • An initial replication job may not begin immediately after saving the integration, depending on the selected Replication Frequency and Anchor Time. Refer to the Anchor Scheduling documentation for more information.

    • You’ll need to contact support to request using an Anchor Time with this integration.

To help prevent overages, consider setting the integration to replicate less frequently. See the Understanding and Reducing Your Row Usage guide for tips on reducing your usage.


Step 8: Select data to replicate

The last step is to select the collections you want to replicate. When you track a collection, you’ll also need to define its Replication Key.

When selecting collections to replicate, keep in mind that:

  • Mongo data can only be tracked at the collection level. When a collection is set to replicate, all fields in the collection will also be tracked by default.
  • Only Key-based Incremental Replication is supported for Mongo integrations at this time. If a collection ever requires full replication - for example, to backfill existing rows with a new field’s values - will require a full re-replication of the integration’s data. Refer to the Reset Replication Keys guide for more info.
  • Mongo Replication Keys require special consideration. Refer to the Mongo Replication Keys guide before you define the Replication Keys for your collections, as incorrectly defining Replication Keys can cause data discrepancies.
  • Nested records will be de-nested if your destination doesn’t natively support nested structures. Refer to the Nested Data Structures guide for more info.

Track collections

You can track collections by:

  1. In the Integration Details page, click the Tables to Replicate tab.
  2. Locate a collection you want to replicate.
  3. Click the checkbox next to the object’s name. A green checkmark means the object is set to replicate.
  4. If there are child objects, they’ll automatically display and you’ll be prompted to select some.
  5. After you set a collection to replicate, the Collection Settings page will display. Note: When you track a table, by default all fields will also be tracked.
  6. In the Collection Settings page, define the collection’s Replication Key.

  7. Repeat this process for every collection you want to replicate.

Initial and historical replication jobs

After you finish setting up MongoDB, its Sync Status may show as Pending on either the Stitch Dashboard or in the Integration Details page.

For a new integration, a Pending status indicates that Stitch is in the process of scheduling the initial replication job for the integration. This may take some time to complete.

Free historical data loads

The first seven days of replication, beginning when data is first replicated, are free. Rows replicated from the new integration during this time won’t count towards your quota. Stitch offers this as a way of testing new integrations, measuring usage, and ensuring historical data volumes don’t quickly consume your quota.


Extracting data from MongoDB

When you connect a database as an input, Stitch only needs read-only access to the databases, collections, and fields you want to replicate. There are two processes Stitch runs during the Extraction phase of the replication process: a structure sync and a data sync.

Structure syncs

This is the first part of the Extraction process. During this phase, Stitch will detect any changes to the structure of your database. For example: A new field is added to one of the collections you set to replicate in Stitch. Structure syncs are how Stitch identifies the databases, tables, and columns to display in the Stitch app.

Stitch runs the following queries on Mongo databases to perform a structure sync:

  • db.getMongo().getDBNames()
  • db.getCollectionNames()

For every collection in the database - even those that aren’t set to replicate - Stitch also runs the following queries:

  • db.collection.count()
  • db.collection.getIndexes()

Data syncs

This is the second part of the Extraction process. During this phase, Stitch extracts data from the source and replicates it.

The tabs below contain info about the queries Stitch runs during the data syncs for each type of Replication Method supported for MongoDB integrations.

Note: MongoDB integrations only support Key-based Incremental replication.

Data syncs for tables using Key-based Incremental

Initial (historical) replication jobs

During the initial replication job for a table using Key-based Incremental Replication, Stitch will replicate the table in full by running a SELECT query and read out of the resulting cursor in batches:

  SELECT field_a, field_b <,...>
    FROM collection_a
ORDER BY replication_key_field
Ongoing replication jobs

During subsequent jobs, Stitch will use the last saved maximum value of the Replication Key column to identify new and updated data.

Stitch will run the following query and read out of the associated cursor in batches:

  SELECT field_a, field_b <,...>
    FROM collection_a
   WHERE replication_key_field >= 'last_maximum_replication_key_value'
ORDER BY replication_key_field

Recommendations

While we make every effort to ensure the queries that Stitch executes don’t impart significant load on your databases, we still have some recommendations for guaranteeing database performance:

  • Use a replica database instead of connecting directly. We recommend using read replicas in lieu of directly connecting production databases with high availability and performance requirements.
  • Apply indexes to Replication Key fields. We restrict and order our replication queries by this field, so applying an index to the fields you’re using as Replication Keys can improve performance. Indexes are required to use Mongo fields as Replication Keys.

Troubleshooting

SSL Connection Errors

Prematurely reached end of file/stream

Applicable only to MongoDB integrations, this error usually means that SSL has been incorrectly configured.

Connecting a database integration to Stitch via SSL has two parts: configuration on the database’s server and in the Stitch app. For the connection to be successful, the settings in both Stitch and on the database server must align.

For example: a MongoDB server doesn’t support SSL connections but the SSL option is checked in Stitch. This will result in a connection error.

First, verify if the MongoDB server is configured to support SSL connections. Then:

  • If SSL connections aren’t supported, make sure the Connect using SSL box in Stitch is unchecked and try saving the integration again.

  • If SSL connections are required, make sure the Connect using SSL box in Stitch is checked and try saving the integration again.

Fields Missing from Replication Key Menu

If fields you expect to see are missing from a collection’s Replication Key menu, it may be that the fields aren’t indexed. Refer to the Mongo Replication Keys guide for more info.


Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.