Technological advancements in the past decade have transformed the software development landscape significantly. Cloud services like Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS) have led enterprises to sunset physical hardware and operating systems, respectively. Similarly, serverless computing has simplified deploying code into production.
Serverless computing cloud providers like Amazon Web Services (AWS) run the server and infrastructure required for computation, data storage, routing, event notification, and visualization for data applications. AWS provides a suite of fully managed services through a pay-as-you-use model to build and run business applications, including capacity planning, scaling, and maintenance.
This blog describes how a leading digital solutions enterprise migrated from legacy data processing pipelines to a high performant, scalable cloud solution to minimize pipeline cost and data ingestion time.
The organization, which provides data-driven insights to personalize customer experiences, was facing many challenges with legacy data processing pipelines:
- A monolithic architecture, which restricted multi-tenancy support
- Manual triggers to poll raw data from the FTP server
- Manual intervention for data fallouts and report generation
- Tightly coupled architecture, which impacted flexibility and reusability
They were looking for a scalable, multi-tenant, performant, flexible, and fault-tolerant solution.
Impetus Technologies Inc. proposed building a serverless ETL pipeline on AWS to create an event-driven data pipeline. To migrate the legacy pipelines, we proposed a cloud-based solution built on AWS serverless services.
The solution provides:
- Data ingestion support from the FTP server using AWS Lambda, CloudWatch Events, and SQS
- Data processing using AWS Glue (crawler and ETL job)
- Failure email notifications using SNS
- Data storage on Amazon S3
Here are some details about the application architecture on AWS.