Data migration is a one-time process of transferring internal data from one storage system to another; it may include preparing, extracting, and, if necessary, transforming the data.

This may sound a bit like data replication or data integration, but each process is different. Data replication is the periodic copying of data from a data source on one platform to a destination on another, while data integration combines data from disparate sources in a data warehouse destination or analysis tool.

Projects that require data migration range from upgrading a server to moving to a new data center and from launching a new application to integrating the resources of a newly acquired company. Ideally, moving data to a new platform, location, or architecture can be completed with no data loss and minimal manual data manipulation or re-creation.

Types of data migration tools

Organizations can write their own data migration scripts or use on-premises or cloud-based tools. Self-scripted data migration is a do-it-yourself in-house solution that may suit small projects, but it doesn’t scale well. On-premises tools work well if all of the data is at a single site. Cloud-based data migration tools may be a better choice for organizations moving data to a cloud-based destination.

Use cases Pros Cons
Self-scripted
  • Small projects
  • Quick fixes
  • Specific source or destination is unsupported by other tools
  • Can be quick to develop
  • May be inexpensive if requirements are simple
  • Coding skills required
  • Changing needs can increase cost
  • Diverts engineers from more strategic tasks
  • Changes can be difficult if code is not well-documented
On-premises
  • Compliance requirements prohibiting cloud-based or multitenant solutions
  • All data sources and destinations are located at a single site
  • Static data requirements with no plans to scale
  • A capex model is preferred over opex
  • IT team has control of full stack from physical to application layers
  • Low latency
  • IT team must manage security and software updates
  • IT team must keep tools up and running
Cloud-based
  • Data sources and/or destinations at multiple sites
  • Need to scale up and down to meet dynamic data requirements
  • Data scientists and business analysts/users at different sites need access to common data warehouses and tools
  • Opex model preferred over capex
  • Agile and scalable enough to handle changing business needs
  • Pay-as-you-go pricing eliminates spending on unused resources
  • On-demand compute power and storage handles demand caused by temporary or bursty events
  • Geographically dispersed users can access data tools
  • Redundant architecture provides the best reliability
  • Security concerns – real or perceived – may lead to internal resistance
  • Solution may not support all required data sources and destinations

 

IT pros can write software to migrate data, but that process can be taxing and time-consuming. Hand-coding big data integrations sometimes results in manual integration tasks and re-implementation of machine learning algorithms.

Using data migration software is a better way to go. The software does the heavy lifting, but data engineers still must understand what data they are migrating, how much will be migrated, and the differences between the source and destination platforms and schemas. They must define the migration strategy, run the migration, test the results, and resolve any issues.

How to select the right data migration tool

Proper planning is the most important part of any data migration effort and should include consideration of data sources and destinations, security, and cost. Selecting a data migration tool is a key component in the planning process, and should be based on the organization’s use case and business requirements.

Data sources and destinations

The number and kind of data sources and destinations is an important consideration. Self-scripting may be able to support any source and destination, but self-scripting is not scalable. It may work for small projects, but you probably don’t want to be coding data extraction scripts for hundreds of sources.

One caveat for on-premises tools is that the supported sources and destinations may vary depending on the operating system on which your tool runs.

Most on-premises and cloud-based data migration tools handle a variety of data sources and destinations. Cloud-based SaaS tools don’t have OS limitations, and vendors upgrade them to support new versions of sources and destinations automatically.

Reliability

Cloud-based data migration tools have close to 100% uptime due to their highly redundant architectures. It would be difficult to match that reliability with on-premises tools.

Performance and scalability

Cloud-based migration tools perform exceptionally well. Compute power and storage in the cloud can scale to meet dynamic data migration requirements. On-premises tools cannot automatically scale up and down as needed because they’re limited by the hardware on which they run.

Security

Data migration tools may have to meet security and compliance requirements. This may rule out some cloud-based tools, but many are compliant with SOC 2, HIPAA, GDPR, and other governance regulations.

Pricing

Many factors affect pricing, including the quantity of data, number and types of sources and destinations, and service level. No particular type of data migration tool will always be the lowest-cost solution for any given data migration project.

Cloud-based data migration tools have pay-as-you-go pricing. For most data migration projects, a cloud solution provides the best pricing; however, some of the pricing models can be a bit confusing. Some cloud services have a free tier that businesses may be able to leverage.

Getting started with cloud data migration

Are you planning a data migration or replication? Stitch offers an easy-to-use ETL tool that can replicate or migrate data from sources to destinations; it makes the job of getting data for analysis faster, easier, and more reliable, so that businesses can get the most out of their data analysis and BI programs.

Stitch is built on the open source Singer project, which allows you to build new integrations if you need to support in-house custom data sources. Sign up for a free trial and migrate your data to its destination in minutes.

Give Stitch a try, on us

Stitch streams all of your data directly to your analytics warehouse.

Set up in minutes Unlimited data volume during trial 5 million rows of data free, forever