30% performance improvement by converting Netezza and Informatica to Azure-Databricks stack
140 Informatica ETL scripts transformed using automation and operationalized in 16 weeks
An American retail company wanted to migrate their legacy data warehouse from Netezza to a cloud-native centralized architecture to improve performance, reduce maintenance cost and architectural complexity, and ensure business continuity.
They wanted to:
- Sunset Netezza and migrate all their data warehouse workloads to Azure Synapse
- Migrate Informatica ETL workflows to Azure Databricks for processing
- Move Tableau and SAP BO reports to Power BI on the cloud for reporting
The Impetus team used their Workload Transformation Solution for end-to-end automated workload assessment to strategize the migration plan of Netezza schema and data, Informatica ETL workflows, and BI reports.
The solution used Azure Data Factory (ADF) pipelines for the one-time migration of 27 TB compressed historical data and ~100 TB of uncompressed data from Netezza to Azure Synapse. The incremental migration of 10GB data per day was performed using Databricks ADF pipelines. The team also transformed 140 Informatica ETL scripts with medium-high complexity to Databricks-native PySpark code and 20 SAP BO reports and 16 Tableau reports to Power BI reports.
A high-level functional component architecture of the implemented solution is given below:
- Used multiple Synapse instances for data segregation and better maintenance
- Brought data from DB2 and a SQL Server external source into Azure Data Lake Storage (ADLS)
- Replaced Tivoli scheduler with ADF
- Used HDInsight for data validation and cell-to-cell comparison between Netezza and Azure Synapse tables
- Built 55 pipelines to ingest data in ADLS via File Sweeper
- Cleansed data residing in ADLS raw layer using custom utilities of the Workload Transformation Solution
The team also provided end-to-end production transition assistance, enabled operationalization on the Azure Synapse platform, and recommended areas for optimization on the target data warehouse.
Our comprehensive solution helped the client sunset Netezza and realize the following benefits:
- Reduced cost by 25%
- Improved performance by 30% using automation
- Improved scalability by transforming workloads to a modern, cloud-native stack
- Ensured data availability in a single layer across the enterprise, enabling innovation and eradicating data silos
- Operationalized Power BI reporting for clients
You may also be interested in…