A centralized data lake on AWS enabled a single source of truth with real-time integration of sources - Impetus

A centralized data lake on AWS enabled a single source of truth with real-time integration of sources

Data lake implementation on AWS resulted in 60% infrastructure cost reduction and improved application performance


The organization, which provides data-driven insights to personalize customer experience, wanted to move away from their enterprise CRM application and consolidate all their solutions on Salesforce. However, storing petabytes of customer data on Salesforce involved high licensing costs and performance issues.

The enterprise was looking for high performance and scalable cloud solutions to integrate their enterprise data with Salesforce Cloud and provide a single source of truth. They wanted to go beyond stock machine learning capabilities of enterprise CRM applications to help the customers find, convert, nurture, and retain more revenue.


The growth solutions provider was facing performance issues and were unable to ingest high-velocity customer data using Salesforce and enterprise CRM applications. Moreover, delay in data sync was creating multiple versions of the truth, providing an inconsistent picture of customer interactions, and often resulting in missed opportunities.

The client was looking for a cost-effective cloud-based solution that can offer scalability, multi-tenancy, performance, and flexibility. The solution would also be able to utilize the existing enterprise data and use advanced analytics to work on a single source of truth for customer data.


The Impetus team created a microservice architecture with a tiered storage/polyglot layer solution to address scalability and performance issues. A centralized data lake on AWS utilized the existing enterprise customer data and used serverless architecture to ingest data from multiple sources automatically.

The team used integration-platform-as-a-service (iPaaS) to move transactional data at scale both into and out of the Salesforce applications.

60% reduction in infrastructure cost by re-architecting their solution on AWS.


The hybrid cloud-based solution provisioned enterprise data lake to:

  • Host customer data from multiple sources
  • Store a copy of active transactional data on Salesforce
  • Maintain a single source of truth with real-time integrations with Salesforce/external services

The solution was more cost-effective than Salesforce, both in terms of data storage and analysis to provide real-time insights to the customers.


  • Used micro-services and polyglot architecture (MongoDB/Solr/Elastic cache) to improve speed, scalability, and performance
  • Created serverless architecture and process automation on AWS (Lambda, DynamoDB, S3, API gateway, SNS, Data pipeline, etc.)
  • Used Prometheus Grafana for application monitoring
  • Used Sensu for incident management
  • Used Amazon Elastic Kubernetes Service (EKS) and docker for container orchestration
  • Used Jenkins/Rundeck for build deployment

Choose a lab aligned to your Data & AI journey

Address your desired use case across critical analytic dimensions

  • Explore architecture options with experts

  • Ensure strategic alignment of business and technology

  • Architect an ideal solution for a pressing problem

  • Validate new or refactor existing architecture

  • Develop a prototype with expert guidance

  • Establish a roadmap to production

Learn more about how our work can support your enterprise