United Airlines wanted to migrate their 20-year-old legacy data pipelines and codebase to an AWS data lake for personalizing the user experience. In the COVID-19 era, where airlines need to check the travel readiness of passengers in real-time, they wanted an intelligent data platform to meet these business and technical needs.
- Provide actionable “in-the-moment” insights at scale using AI and ML-based models
- Handle diverse and complex use cases leveraging real-time and batch analytics
- Process structured, semi-structured, and unstructured data
- Provide a single source of truth
- Ingest, cleanse, catalog, optimize, and analyze data from diverse sources in real-time
- A self-service framework to facilitate:
- Data onboarding and AWS pipeline creation
- Automated resource creation (pipeline processing components like Glue jobs, Lambda functions, etc.)
- Automated data quality checks
- Metadata capture and cataloging
- Cloud cost monitoring and optimization
- Intelligent monitoring and alerting to ensure:
- High availability
- AWS cost optimization
- Operational readiness
Support for 200+ batch and 30+ real-time multi-TB data pipelines with automated infrastructure setup and deployment
The data and analytics platform migration and modernization journey involved three steps:
- Rearchitecting the platform
- Using frameworks and accelerators
- Onboarding key use cases
The Impetus team helped in automating common repeatable patterns and helped enable templates and reusable components for design and architecting workloads to an AWS Lakehouse architecture. United also used a cloud transformation accelerator called LeapLogic to convert a significant portion of data transformation code to AWS. This product of Impetus Technologies helped gain significant momentum and acceleration in building data pipelines on AWS. LeapLogic’s automation capabilities helped save 22% of the time and 70% of the effort compared to coding it manually. The team also created ingestion, monitoring, and validation frameworks to ingest data feeds from various upstream systems to Amazon S3.
Building the next gen unified data and analytics platform also involved building new and modern data processing logic on AWS Glue, Spark on EMR, S3, and Redshift. To accomplish this United used Gathr, Impetus’ all-in-one data pipeline platform. The AWS Lakehouse architecture was built using AWS offerings like Glue, Kinesis, Athena, Redshift Spectrum, EMR, and SageMaker.
United’s centralized data platform’s capabilities are as follows:
- Support for 200+ batch and 30+ real-time multi-TB data feeds
- Unified data catalog and governance to authorize, manage, and audit access to data
- Automated data quality checks (null, regex, data type, etc.)
- Single-click deployment for data pipelines and platforms using AWS CloudFormation templates and AWS CodeDeploy
- Serverless data pipelines leveraging Lambda, Managed Airflow, Glue, EMR, and CloudWatch
- End-to-end DevOps for data platform and use cases
- Intelligent data profiling and data quality checks
- Unified consumption layer for seamless onboarding of diverse use cases
The AWS-based data platform enabled easy onboarding of use cases and ensured data accuracy and quality for downstream applications. It helped the airline improve the passenger experience by:
- Personalizing bundle offers to include customer preferred amenities (e.g., wi-fi, entertainment options, etc.) using ML models
- Reducing wait time at the airport gates by predicting passengers’ adherence to COVID travel requirements
- Auto-approving travel readiness by verifying the validity of COVID documents like test reports, vaccinations, government forms, etc.
The flexible and scalable platform enabled unified governance and consumption and equipped the airline’s business teams to process data faster for real-time decision making and improved cost efficiency.