40% cost savings through automated Hadoop migration to AWS EMR - Impetus

40% cost savings through automated Hadoop migration to AWS EMR

Enhanced data availability fuels unforgettable customer experiences and streamlined internal processes in a global hospitality chain.


Business needs

A Fortune 500 global hospitality chain wanted to optimize its data platform by migrating from the on-premises Hortonworks Data Platform (HDP) to AWS Elastic MapReduce (EMR). With their HDP license nearing expiration, the client wanted to address challenges related to agility, scalability, high maintenance costs, and excessive management overheads. Merely renewing the HDP license was not a long-term solution, prompting the client to seek a migration partner. Their goals were to:

  • Reduce maintenance cost of the on-premises Hadoop data platform​
  • Take advantage of the pay-as-you-use model, optimizing infrastructure for workloads and minimizing operational expenses
  • Leverage the cloud benefits such as scalability, flexibility, agility​, security, and compliance
  • Improve data accessibility and modernize analytics capabilities to boost internal operations and enhance customer experience

Seamlessly migrated 700+ scripts and 61 refineries involving Hive, Spark Scala, PySpark, MapReduce, PIG, data science, etc. to AWS EMR​



Impetus conducted a comprehensive analysis of the hospitality chain’s existing data ecosystem and architected a migration solution to AWS EMR that can help them make the most of their data.

  • Seamless ingestion of real-time feeds into S3 buckets, along with the extraction of the transformed feeds from Snowflake tables
  • Smooth migration of data refineries running on HDP that consumed data feeds and hosted transformation scripts (HQL, Spark Scala, PySpark, MapReduce, PIG, data science) for the ETL process on AWS EMR

Solution highlights

  • Conducted technology version upgrades and implemented necessary code changes to ensure compatibility
  • Enhanced the performance of the refinery flow by rearchitecting the code, eliminating redundant data transfers from Amazon S3 to HDFS through an external table
  • Optimized query performance by fine-tuning multiple JOIN queries
  • Implemented security measures, including Multi-factor Authentication (MFA) for both the root user and IAM users to meet security requirements
  • Enhanced security controls by utilizing Amazon VPC for production, ensuring restricted traffic through route table entries and Security Group inbound/outbound rules
  • Ensured data security by encrypting data at rest and in transit using AWS Key Management Service (SSE-KMS) as the default encryption method
  • Automated the execution of all data refineries using Jenkins & Airflow pipeline on the transient cluster, leveraging using Terraform scripts
  • Enabled proactive monitoring of the resource health and status through AWS CloudWatch metrics

50% reduction in overall delivery effort with LeapLogic’s automated capabilities



With a seamless migration to AWS EMR, the client harnessed the full potential of advanced data processing and analysis. Our comprehensive solution delivered remarkable outcomes, empowering them to:

  • Achieve a flawless migration of HDP scripts to AWS EMR, ensuring zero user acceptance testing (UAT) defects
  • Enhance flexibility by transitioning to a scalable infrastructure
  • Boost performance by eliminating redundant data flows and optimizing multiple JOIN queries
  • Rearchitect outdated features, such as Chef-Client, by introducing efficient code alternatives
  • Realize significant infrastructure cost savings of 40%

Choose a lab aligned to your Data & AI journey

Address your desired use case across critical analytic dimensions

  • Explore architecture options with experts

  • Ensure strategic alignment of business and technology

  • Architect an ideal solution for a pressing problem

  • Validate new or refactor existing architecture

  • Develop a prototype with expert guidance

  • Establish a roadmap to production

Learn more about how our work can support your enterprise