An American retail company wanted to migrate their legacy data warehouse from Netezza to a cloud-native centralized architecture to improve performance, reduce maintenance cost and architectural complexity, and ensure business continuity.
They wanted to:
- Sunset Netezza and migrate all their data warehouse workloads to Azure Synapse
- Migrate Informatica ETL workflows to Azure Databricks for processing
- Move Tableau and SAP BO reports to Power BI on the cloud for reporting
Transformed 140 Informatica ETL scripts to Databricks-native PySpark code
The Impetus team used LeapLogic for end-to-end automated workload assessment to strategize the migration plan of Netezza schema and data, Informatica ETL workflows, and BI reports.
The solution used Azure Data Factory (ADF) pipelines for the one-time migration of 27 TB compressed historical data and ~100 TB of uncompressed data from Netezza to Azure Synapse. The incremental migration of 10GB data per day was performed using Databricks ADF pipelines. The team also transformed 140 Informatica ETL scripts with medium-high complexity to Databricks-native PySpark code and 20 SAP BO reports and 16 Tableau reports to Power BI reports.
As the media house required a scalable, cost-effective data warehouse, Snowflake on AWS was used, which supports JSON-based functions, offers instant scaling, and automated maintenance. For BI, Power BI was leveraged, which consumed data from Snowflake for data visualization and analytics reporting. DevOps helped in streamlining code deployment in AWS.
27 TB compressed data and ~100 TB of uncompressed data migration from Netezza to Azure Synapse
A high-level functional component architecture of the implemented solution is given below:
- Used multiple Synapse instances for data segregation and better maintenance
- Brought data from DB2 and a SQL Server external source into Azure Data Lake Storage (ADLS)
- Replaced Tivoli scheduler with ADF
- Used HDInsight for data validation and cell-to-cell comparison between Netezza and Azure Synapse tables
- Built 55 pipelines to ingest data in ADLS via File Sweeper
- Cleansed data residing in ADLS raw layer using custom utilities of the Workload Transformation Solution
The team also provided end-to-end production transition assistance, enabled operationalization on the Azure Synapse platform, and recommended areas for optimization on the target data warehouse.
25% cost reduction by sunsetting Netezza
Our comprehensive solution helped the client sunset Netezza and realize the following benefits:
- Reduced cost by 25%
- Improved performance by 30% using automation
- Improved scalability by transforming workloads to a modern, cloud-native stack
- Ensured data availability in a single layer across the enterprise, enabling innovation and eradicating data silos
- Operationalized Power BI reporting for clients