Introduction to Data Migration with SSIS and Azure Data Factory
Data migration is a critical phase in Microsoft Power Platform implementation projects. The main tools for managing ETL (Extract, Transform, Load) processes are SQL Server Integration Services (SSIS) and Azure Data Factory (ADF). Both allow you to extract data from legacy systems, transform it, and load it into Microsoft Dataverse or intermediate systems such as SQL Server or Azure SQL Database.
SSIS is a mature, well-established solution for on-premises or hybrid scenarios, while Azure Data Factory represents the cloud-native version for orchestrating integration and data migration pipelines in Azure environments.
SQL Server Integration Services (SSIS)
SSIS is the enterprise ETL tool of choice for complex migrations to Dataverse. It can connect to multiple data sources, consolidate data in a staging database, and manage advanced transformations. Its architecture is based on SSIS packages that contain data flows and control flows for process orchestration.
Each package can be executed manually or scheduled and includes connectors for different technologies. For Dataverse, Microsoft recommends using the SSIS Integration Toolkit for Microsoft Dynamics 365 developed by KingswaySoft, which provides native connectors for Dynamics 365 and Dataverse.
Typical Phases of a Migration with SSIS
- Extraction: importing data from legacy sources into the staging database via connectors or linked servers.
- Transformation: consolidating, cleansing, and mapping data to match the Dataverse schema.
- Loading: importing consolidated data into Dataverse, optimizing throughput using batch and multithreading configurations.
During the loading phase, the KingswaySoft adapter allows you to configure the batch size and the number of parallel threads to maximize performance and reduce Dataverse API limit errors. Testing multiple configurations is essential to find the right balance between speed and reliability.
Automation and Iterations
Data migration is rarely a one-shot activity. SSIS allows you to develop automated processes that can be rerun iteratively for testing, validation, and corrections. Automation ensures consistency and traceability over time.
Azure Data Factory (ADF)
Azure Data Factory is Microsoft’s cloud ETL platform designed to orchestrate large-scale data pipelines. It is especially suited for fully cloud environments and integrates seamlessly with Azure SQL, Azure Data Lake, Azure Cosmos DB, and Microsoft Dataverse.
ADF offers a modular approach based on pipelines, activities, and datasets:
- Pipelines: contain logical flows of activities for data migration and transformation.
- Activities: represent individual operations such as copying data, invoking an API, or executing a Power Query dataflow.
- Datasets: define connections to sources and destinations such as Azure SQL, Data Lake, or Dataverse.
A common example is a pipeline that copies data from a Dataverse dataset to an Azure SQL dataset, with column mappings and transformations applied through a dataflow.
Dataflows and Transformations
ADF dataflows perform complex transformations such as splitting, merging, pivoting, and filtering. These operations run in scalable compute environments and can replace many SSIS transformations in cloud-native scenarios.
Automation and Orchestration
ADF enables process automation through triggers and continuous monitoring. Pipelines can run on a schedule or in response to events like new data arriving in an Azure Data Lake container. Pipelines can also invoke Azure Functions or Power Automate flows to implement custom logic.
When to Choose SSIS vs. ADF
SSIS vs. Azure Data Factory Comparison
- SSIS: ideal for on-premises or hybrid environments, with direct SQL Server integration and familiar developer tools.
- ADF: perfect for cloud-first environments, offering auto-scaling, centralized management, and native integration with Azure Services.
- Performance: ADF offers distributed parallelism, while SSIS benefits from locally controlled multithreading.
- Licensing: SSIS requires SQL Server licenses, while ADF uses a pay-per-use consumption model.
Data Migration Best Practices
- Design a staging area to consolidate data from multiple sources before importing into Dataverse.
- Define clear and documented mappings between source and target fields.
- Use optimized batches to mitigate Dataverse API limitations.
- Automate pipelines to allow repeatable testing and iterations.
- Monitor performance and configure retries for transient pipeline errors.
 
          