Replicate Assets in Microsoft Fabric
In a Microsoft Fabric context, the concept of "replicating assets" differs from classic IaaS-based lift-and-shift migrations. Instead of virtual machines, replication in Fabric focuses on data, metadata, and pipelines across the Fabric platform and into OneLake.
Fabric-specific replication steps
Microsoft Fabric workloads depend on replicated data sources, semantic models, pipelines, and optionally lakehouse tables or data warehouses. The replication process consists of:
- SQL Server Replication: In hybrid environments, transactional replication or snapshot replication can be used from SQL Server to Fabric via Azure SQL as intermediate.
-
Source System Capture:
- Use Change Data Capture (CDC) from SQL Server or Azure SQL via Azure Data Factory or Fabric Dataflows Gen2.
- Use Eventstream to ingest streaming data.
- Use Azure Data Explorer (ADX) if time-series ingestion is required.
-
Seeding:
- Historical data is seeded using Copy activities in Dataflows Gen2, Azure Data Factory pipelines, or direct copy to Lakehouse via Spark Notebooks.
- For large-volume ingestion, consider PolyBase, Bulk Insert, or Azure Data Box for offline loads.
-
Synchronization:
- Keep datasets updated via:
- Dataflows with scheduled refresh
- Pipelines with triggers
- SQL CDC for low-latency sync
- Ensure schema drift handling is defined in pipelines or staging Lakehouses.
- Keep datasets updated via:
Typical replication targets in Fabric
| Source | Method | Target |
|---|---|---|
| Azure SQL DB | CDC + Dataflow Gen2 | Lakehouse or Warehouse |
| SQL Server | Self-hosted IR + ADF | Lakehouse (Bronze) |
| Blob Storage | Dataflow Gen2 / Eventstream | Lakehouse (Bronze) |
| On-prem SQL | Azure Data Factory / Data Box | Lakehouse or Warehouse |
| SAP / Oracle | Azure Data Factory connectors | Lakehouse |
| REST API / SaaS | Dataflow Gen2 (API support) | Lakehouse or Warehouse |
| SQL Server (Transactional Replication) | Native SQL Replication to Azure SQL + Dataflow Gen2 | Lakehouse |
Risks and constraints in Fabric replication
- Disk Drift ≈ Schema Drift: Ongoing schema evolution in source systems can desynchronize pipelines and must be monitored continuously.
- Replication latency and snapshot intervals: When using SQL Replication, understand the impact of snapshot agent scheduling or transactional log shipping lag, especially in systems with tight SLA constraints.
- WAN/Networking: Consider the bandwidth to OneLake, especially with hybrid or federated environments.
- Concurrency limits: Lakehouse write performance is influenced by concurrent ingestion and Spark pool limits.
- Semantic replication: Reports and Datasets (Power BI) can reference replicated models, but need to be validated after promotion.
Tools and services involved
- Microsoft Fabric Dataflows Gen2
- OneLake Shortcuts
- Eventstream + KQL DB
- Azure Data Factory / Synapse Pipelines
- Azure Data Box (for large data volume initial load)
- Power BI REST APIs (for semantic asset rehydration)
- SQL Server Replication (Transactional Replication or Snapshot Replication to Azure SQL or Fabric Lakehouse)
Example Promotion Model
Related Resources
- Get started with Fabric Dataflows Gen2
- Migrate databases using Azure Data Factory
- Use Eventstreams in Microsoft Fabric
- Azure Data Box
- Transactional replication to Azure SQL
- Migrate with SQL Server replication
✅ Replication in Microsoft Fabric focuses on data pipelines, CDC, lakehouses, and semantic assets — not on VMs. Promote early, minimize schema drift, and prepare pipelines for robustness.