01
Business needs
A Fortune 500 global hospitality chain wanted to optimize its data platform by migrating from the on-premises Hortonworks Data Platform (HDP) to AWS Elastic MapReduce (EMR). With their HDP license nearing expiration, the client wanted to address challenges related to agility, scalability, high maintenance costs, and excessive management overheads. Merely renewing the HDP license was not a long-term solution, prompting the client to seek a migration partner. Their goals were to:
- Reduce maintenance cost of the on-premises Hadoop data platform
- Take advantage of the pay-as-you-use model, optimizing infrastructure for workloads and minimizing operational expenses
- Leverage the cloud benefits such as scalability, flexibility, agility, security, and compliance
- Improve data accessibility and modernize analytics capabilities to boost internal operations and enhance customer experience

Seamlessly migrated 700+ scripts and 61 refineries involving Hive, Spark Scala, PySpark, MapReduce, PIG, data science, etc. to AWS EMR
02
Solution
Impetus conducted a comprehensive analysis of the hospitality chain’s existing data ecosystem and architected a migration solution to AWS EMR that can help them make the most of their data.
- Seamless ingestion of real-time feeds into S3 buckets, along with the extraction of the transformed feeds from Snowflake tables
- Smooth migration of data refineries running on HDP that consumed data feeds and hosted transformation scripts (HQL, Spark Scala, PySpark, MapReduce, PIG, data science) for the ETL process on AWS EMR
Solution highlights
- Conducted technology version upgrades and implemented necessary code changes to ensure compatibility
- Enhanced the performance of the refinery flow by rearchitecting the code, eliminating redundant data transfers from Amazon S3 to HDFS through an external table
- Optimized query performance by fine-tuning multiple JOIN queries
- Implemented security measures, including Multi-factor Authentication (MFA) for both the root user and IAM users to meet security requirements
- Enhanced security controls by utilizing Amazon VPC for production, ensuring restricted traffic through route table entries and Security Group inbound/outbound rules
- Ensured data security by encrypting data at rest and in transit using AWS Key Management Service (SSE-KMS) as the default encryption method
- Automated the execution of all data refineries using Jenkins & Airflow pipeline on the transient cluster, leveraging using Terraform scripts
- Enabled proactive monitoring of the resource health and status through AWS CloudWatch metrics

50% reduction in overall delivery effort with LeapLogic’s automated capabilities
03
Impact
With a seamless migration to AWS EMR, the client harnessed the full potential of advanced data processing and analysis. Our comprehensive solution delivered remarkable outcomes, empowering them to:
- Achieve a flawless migration of HDP scripts to AWS EMR, ensuring zero user acceptance testing (UAT) defects
- Enhance flexibility by transitioning to a scalable infrastructure
- Boost performance by eliminating redundant data flows and optimizing multiple JOIN queries
- Rearchitect outdated features, such as Chef-Client, by introducing efficient code alternatives
- Realize significant infrastructure cost savings of 40%