Case Study

A Fortune 500 bank migrated 3.4 billion records and 7000 Vertica DDLs in 3 months

Released 20% capacity and reduced 40% TCO using automation for end-to-end transformation and validation


Challenge

The Fortune 500 multinational bank wanted to accelerate time-to-market and reduce TCO by transforming from Vertica to a Hadoop-based system. Their current architecture was complicated, had data size limitations, and was proving to be expensive with large data sizes.

The bank wanted to minimize data storage costs and simplify their data lake architecture. They also wanted real-time insights from customer interactions to offer them recommendations based on their interests, risk appetite, and capital commitment.

 

Discovery

Transforming legacy Vertica workloads to Hadoop on a Cloudera-based platform involved identification and transformation of costly and resource-consuming ETL and analytical workloads. The bank was struggling to convert complicated and mission-critical logic. They were looking for a solution that would simplify and accelerate the transformation process with minimum risk to ensure data integrity and quality. The solution scope involved assessment, identification, recommendations, and transformation of workloads to be offloaded to Hadoop.

 

 

Impact

Impetus Technologies Inc. used the Workload Transformation Solution to automatically identify and migrate resource-intensive Vertica workloads like DDLs and KSH scripts to Hadoop, thereby offering expanded capabilities and opportunities for data exploration and analytics.

 

Workload profiling

Workload assessment for schema optimization

Schema creation and data migration

Logic transformation

Workload validation and execution

Analytical data moved to EDW

An end-to-end solution was delivered by automatically converting:

  • Approximately 7000 Vertica DDLs and 1500 views to Hive
  • Around 3000 KSH scripts to the Linux version of the Hadoop cluster
  • Modularized 2500 VSQL and embodied Vertica SQL (VSQL) queries in Shell-compatible HiveQL/Spark SQL
  • Awk scripts to the Linux version of the Hadoop cluster
  • 50 Autosys scheduling jobs
  • Migrated around 100 tables from Vertica to Hive which contained about 200TB of compressed data

 

With fast and reliable code transformation, the solution enabled the bank to leverage their existing investments; shorten the development, testing, and validation cycles; and improved developer productivity. The delivery of the solution included:

  • SQOOP-based data migration scripts for all Vertica tables
  • Migrated and validated sample data for 140 tables used in 8 VAPs (3.4 billion records)
  • High-level solution architecture and design
  • Partitioning, bucketing, and clustering recommendations for all tables
  • Functional testing support and defect fixing
  • Hive UDF support for the native function of Vertica, which were not available in Hive