19 Oct 2021

‘Shift-left’ reduces cloud cost by 50% for a Fortune 500 insurance firm

While cloud adoption continues to accelerate, with 36% of enterprises spending more than $12 million per year on public clouds, businesses are looking for ways to optimize their cloud spend. According to Flexera 2021 State of the Cloud Report , organizations waste 30% of cloud spend, making controlling cloud cost the top challenge in public cloud adoption.

IDC Cloud Survey 20201
Source: IDC Cloud Survey 2020.

Gartner predicts that through 2024, nearly all legacy applications migrated to public cloud infrastructure as a service (IaaS) will require optimization to become more cost-effective. This blog details how we helped a Fortune Global 500 insurance brokerage and risk management company halve their cloud cost by optimizing their spend.

A few months after implementing an AWS-based data lake for a Fortune 500 firm, we discovered that their in-house IT team was unable to control cloud consumption costs. They were using AWS Cost Explorer, which monitored cloud costs for different teams in silos, without optimizing usage.

The realization prompted us to explore why the cost spiraled with increasing amounts of data. To optimize cost, we adopted a “shift-left” approach, which is widely used for maintaining code quality. Our focus was on empowering the Dev teams, the primary consumers of cloud services and resources, to gain more visibility into the costs based on their usage. This, in turn, would make them more accountable and enable them to monitor costs effectively.

Cloud Optimize Cloud Cost

To optimize cloud cost, the Impetus DevOps team:

  1. Defined tagging policy and guidelines for AWS resources with tag keys like application, business unit, environment, etc.
  2. Updated the automation scripts to ensure all created resources were tagged
  3. Created custom rules to scan all resources and report non-tagged resources to the management
  4. Created a tagging strategy for various services to filter resources for cost analysis.

    The pie chart below highlights the cost distribution between tagged and untagged resources. As most of the cost was incurred by tagged resources, we identified the teams and applications incurring the cost, which helped avoid resource wastage.

    Cost distribution of tagged vs non-tagged resources
    Cost distribution of tagged vs non-tagged resources
  5. Created alerts for services reaching/exceeding threshold
  6. Enabled automated termination of long-running EMR clusters
  7. Enabled termination of idle EMR clusters on weekends
  8. Compared historical price patterns and peaks with existing usage to effectively plan an auto-scaling mechanism
  9. Reported idle/stopped resources and security vulnerabilities
  10. Created a tagging solution for all services and components
  11. Created dashboards for billing and custom dashboards for tagged resources
  12. Designed a strategy to gather information from all accounts in a single billing dashboard on AWS. This helped to compare consumption across Dev, Stage, Prod, and Labs accounts. The report was further analyzed to optimize Dev spending and formulate stricter policies on resource provisioning and termination
  13. Implemented lifecycle rules on Amazon S3 buckets to automatically move old data to lower storage tiers, resulting in reduced storage costs

    The following graph highlights the cost reduction after implementing S3 policies.

    Cost reduction using S3 lifecycle management.png
    Cost reduction using S3 lifecycle management
  14. Used spot instances for EMR core nodes, which are up to 50% cheaper than on-demand instances

The customized cost analysis solution with built-in controls provided specific access to Dev teams to continuously monitor and track cloud cost usage for taking corrective actions based on their consumption. The solution enabled the team to:

  1. Connect with AWS, Azure, and GCP accounts to collect and analyze cloud costs in one place
  2. Get quick insights categorized by region, services, tags, and more
  3. Simplify workflows to manage the resource tags and maintain tag hygiene for accurate cost analysis
  4. Get rule-based alerts for any spikes and recommendations to improve cost-efficiency
  5. Reduce multi-cloud wastage by analyzing spends against budgets and forecasting costs and usage

The solution helped the Fortune 500 insurance firm reduce their monthly cloud expenditure from $120000 to $60000.

Dashboard of the customized cost analysis solution
Dashboard of the customized cost analysis solution

Monitoring cloud costs continuously and optimizing resource utilization are critical for reducing cloud spend and realizing the benefits of the cloud. While every cloud provider offers reports and dashboards to track resource consumption and costs, correlating the data, identifying provisioning inefficiencies, controlling virtual sprawl, and analyzing cloud spend across multiple providers can be challenging. The implemented solution (a custom cost explorer) simplifies hybrid and multi-cloud cost analysis by consolidating expense data from different cloud platforms and accounts. It helps businesses optimize their cloud expenditure and predict future costs.

12 Mar 2021

Enabling enterprise-grade Kubernetes security for a Fortune 100 credit card company

Kubernetes use in production has increased to 83%, up from 78% last year. - Cloud Native Computing Foundation (CNCF) Survey 2020

Containers and microservices are driving enterprise IT innovation and digital transformation across industries. Companies are embracing container technologies like Kubernetes to realize greater flexibility, scalability, and speed across application development and deployment processes. Yet, for many enterprises, security remains one of the biggest concerns for kick-starting containerization initiatives.

This blog focuses on how we helped a Fortune 100 credit card company secure their Kubernetes clusters and effectively monitor security practices leveraging automation. The customer was looking to move their enterprise solution for risk and compliance to Kubernetes to meet expanding business needs. Here are some highlights of the security practices we implemented as part of this project:

Used an internal, private registry for container images

As hackers often prey on image vulnerabilities, using trustworthy registries (private, wherever possible) for container images is crucial. We helped the client set up an internal private registry for storing approved images. Any external images had to be downloaded using reverse proxies, validated, scanned, and pushed into the local repository. Their Kubernetes Admin team had permission to upload these to the internal repository after extensive security and compliance checks.

Integrated vulnerability scanning tools with the code build process

For continuous scanning of images and code, we integrated tools like Black Duck and WhiteHat with the code build process. This ensured that unsecured code was not included in any image. In addition, we performed image scanning leveraging Clair, an open-source tool that can be easily deployed in Kubernetes and integrated directly with the container registry.

Isolated environments using namespaces and role-based access control

To isolate environments for different teams and users, we leveraged Kubernetes namespaces. We also applied Kubernetes role-based access control (RBAC) on a per-namespace level, restricting the access to each Kubernetes service to specific users based on business needs. We also leveraged the Cluster Admin role to restrict cluster level access. Additionally, to streamline RBAC management and prohibit access by any unauthorized users, we periodically removed all unused/inactive roles.

Secured all Kubernetes components

To secure all Kubernetes components, we enabled and configured Kubernetes RBAC and limited the number of users and service accounts accessing the API server. All traffic between the API server and other infrastructure components, like etcd and kubelet, were served over HTTPS (Transport Layer Security) and all communication was protected leveraging TLS encryption. The API server did not serve any requests on unsecure ports, and all server audit logs were collected and retained. To limit damage in the event of an attack, we locked down the ownership and permissions needed to access critical configuration and PKI files on the master node. In addition, we configured the kubelet config file to prevent the kubelet server from serving any anonymous/unauthenticated requests.

Leveraged egress policies and controllers for secure networking

To securely route network traffic to internal Kubernetes services, we used Nginx as a load balancer and network gateway for managing inbound and outbound connections. Nginx also served as a frontend proxy to limit the exposure of these services to end users.

Developed an automated solution for continuous monitoring

We developed an automated solution to continuously monitor Kubernetes’ security practices across multiple clusters with minimal effort. A high-level blueprint of the solution is given below:


High-level overview of the automated solution for continuous monitoring

This helped us periodically run the security benchmark in an automated manner and promptly identify any misconfigurations/incorrect practices. We also leveraged a security dashboard to perform security audits at the beginning of the project. A sample snapshot is given below:


Kubernetes security dashboard

Additionally, we stored Kubernetes events in the ELK Stack and generated insights on the dashboard. This helped proactively monitor any suspicious events and generate alerts.

Our integrated approach helped the credit card company improve their security posture by identifying security threats and misconfigurations across multiple clusters in real-time leveraging automation. This eliminated the need for manual monitoring, which would have involved massive time and effort. What’s more, the security dashboard provided a single, consolidated view of all critical security tests, enabling 360-degree visibility across clusters. The automated scanning utility provided daily updates on cluster stats, allowing the client’s Admin team to focus on other strategic tasks. Most importantly, the client was able to run their applications on secure, stable, resilient clusters.

29 Dec 2020

Ten best practices for containerization on the cloud

By 2022, more than 75% of global organizations will be running containerized applications. – Gartner Inc.

Containerization represents a breakthrough for DevOps teams as it lets them focus on application architecture and deployment, rather than worrying about the underlying infrastructure plumbing. As lightweight software units that package applications with their code dependencies, containers make it easy to create cloud-native applications running on physical or virtual infrastructure. Based on our recent engagements with Fortune 1000 companies, we have put together ten best practices that can help you accelerate application deployment on the cloud using containers:

1. Use a hybrid strategy for application modernization

A popular approach to modernizing monolithic applications is to “lift and shift,” wherein the entire application is bundled as a container and deployed. While this is easy and fast to execute, development teams find it difficult to push frequent and small changes to applications that have been deployed as a whole. Another approach is to completely rearchitect an existing application, but this process often proves to be complex and time-consuming. To save time and effort, we recommend a hybrid approach that combines the best of both these worlds. This involves analyzing your applications based on usage patterns and identifying modules that can be decoupled from the application and containerized. You can use conversion tools like AWS App2Container to discover your on-premises applications and automatically create Docker images for these. You can also leverage services like AWS Fargate, Azure Container Instance, or Google Cloud Run to rapidly productionalize your containers on the cloud.

We used this hybrid approach to help a US-based healthcare technology company containerize their Windows-based legacy applications and move to a cloud-agnostic architecture leveraging Azure Kubernetes Service. This helped them navigate deployment complexities and reduce the time and effort spent on customer support while eliminating Windows dependencies.

2. Follow the Twelve-Factor App methodology for cloud-native development

Developers and DevOps engineers can leverage the Twelve-Factor App methodology to build containerized applications that are resilient, portable, and cloud-agnostic. This methodology spells out best practices across twelve important factors, including codebase, dependencies, configurations, processes, concurrency, and logs, among others. Implementing these guidelines helps enterprises achieve the level of innovation, speed, and agility they need to succeed in the marketplace. You can apply the Twelve-Factor methodology to applications written in any programming language, regardless of which combination of backing services (database, queue, memory cache, etc.) you use.

We successfully used this methodology to develop a self-service web portal for a Fortune Global 500 insurance brokerage firm, leveraging containers for deployment. The portal enabled business users to seamlessly ingest data on their AWS data lake and reduced ingestion time from hours to minutes.

3. Include only necessary dependencies in the build

To keep containers as lightweight as possible, it is important to include only the necessary dependencies. Picture this – you are building a classic Apache/MySQL/PHP stack and are tempted to run all the components in a single container. However, the best practice is to use different containers for Apache, MySQL, and PHP (if you are running PHP-FPM). We suggest following the single-responsibility principle (SRP) to write clean, well-structured code focused on a single functionality, as this helps limit the dependency on other application components. We also recommend building smaller container images to enable faster uploads and downloads – the smaller an image, the faster it can be downloaded and run. In addition, smaller images can help you lower cloud storage costs and quickly scale up or down in response to application user traffic.

The Fortune Global 500 insurance brokerage firm also wanted to enable their end-users to upload data files to AWS and interact with Amazon SQS. By reducing the number of dependencies in the build, we were able to reduce the image size by almost 200 MB, enabling a seamless user experience.

4. Optimize for build cache

Creating effective, clean images is a vital step in containerization. When building an image, Docker skims through the instructions in your Dockerfile and executes them in the specified order. It also looks for reusable images in its cache and reuses layers from previous builds. This helps avoid the potentially costly step of recreating an image and helps improve build time significantly. To eliminate intermediate image layers, try reducing the number of instructions in your Dockerfile. For instance, you can choose to have a single command with all installation inputs instead of having multiple commands for the same.

We recently used this practice while building a Docker container for a machine learning algorithm. The image required multiple Python dependencies and the initial Docker file had separate “yum install” commands for each installation. By including all the installations in a single command, we could reduce intermediate image layers and reuse images easily across models.

5. Integrate image scanning as part of your CI/CD pipeline

In today’s ever-changing threat landscape, using trusted image sources is not enough. To ensure airtight security, you must integrate container image scanning with CI/CD tooling and processes. All images should be scanned in line with the organization’s security policy each time CI/CD is run to minimize the risk of attack vectors being installed on the organization’s network. As image build policies typically reside in a security engine, any security policy failure should trigger a failed build directly within your CI/CD system and provide the necessary remediation steps. To enable this, we recommend integrating image scanning as part of your pre-created CI pipelines. You can also leverage open source tools like Anchore and Qualys for performing image vulnerability scans each time an image is created with a new code.

For the insurance brokerage firm mentioned earlier, we developed a single-click Jenkins pipeline to automatically deploy containerized services to development, staging and production environments. This helped the client shorten their release cycle from 6 weeks to 1 week.

6. Monitor telemetry data for your entire stack

In the containerization universe, monitoring should not be limited to infrastructure. You should closely monitor all aspects of applications, such as logs, load time, and the number of HTTP requests. In terms of errors, ensure that your monitoring strategy covers application exceptions, database errors/warnings, and weblogs indicating unusual requests, etc. It is equally important to monitor cloud-specific telemetry and check for outages and internet latency. While cloud providers offer monitoring services like AWS CloudWatch, Azure Monitor, and Google Stackdriver, you should augment these with advanced tools to monitor network ingress and egress traffic, security breaches, and platform availability. You can also integrate monitoring and log analytics capabilities with cluster creation using tools like Prometheus, Grafana, Elastic, Fluentd, Kibara and Jaegar. These, coupled with a unified monitoring dashboard, can help you realize 360-degree visibility and observability across your containerized applications and environments on the cloud.

For a digital customer journey experience company, we set up a monitoring dashboard using Datadog and Splunk for platform and application level monitoring. We also integrated their Docker environment and containers with ELK to support application debugging. These tools helped the customer achieve complete visibility of their infrastructure, Docker platform, and applications.

7. Tag your images

Docker images are generally identified by two components – name and tag. For instance, for the image “google/cloud-sdk:193.0.0”, “google/cloud-sdk” is the name, and “193.0.0” is the tag. If you do not provide a tag in your Docker commands, the system uses the latest tag by default. At any given time, the name and tag pair is unique, but the same tag can be reassigned to a different image if needed. When you build a container image, be sure to tag it accurately as this helps in versioning and easy rollback during deployment. We recommend following a consistent, well-documented tagging policy that can be easily understood by image users.

The digital customer journey experience company leveraged these tagging practices to efficiently manage images in Amazon ECR, purge old images, and save cloud storage costs.

8. Decouple containers from infrastructure

Containers enable “write once, run anywhere” portability and performance isolation on shared infrastructure. This means that databases and containers are decoupled from the operating system and IT infrastructure to provide workload portability from one host to another, anywhere, any time. However, sometimes we come across stateful containers tightly coupled with a storage layer on the infrastructure, which becomes a major bottleneck for scaling. You can address this by using network-based storage or cloud object storage on Amazon Elastic Kubernetes Service (EKS) or Azure Kubernetes Service (AKS), as these can be easily accessed from any node in the cluster.

While recently containerizing a data intelligence and analytics application on Kubernetes, we used NFS storage to decouple stateful components like RabbitMQ and Elastic from the host infrastructure. We could then run these components across any node in the cluster, making the deployment scalable and easy to manage.

9. Declare resource requirements

All container deployments should declare resource requirements like storage, compute, memory, and network to ensure that resources are not utilized infinitely. It is equally important that your applications stay confined to these indicated resource requirements as they are less likely to be terminated/migrated if resource starvation occurs. Declaring resources also helps DevOps teams set monitoring alerts and take informed decisions related to scaling. This is especially important in the cloud, where resources can scale automatically, leading to increased costs. For effective and optimized scheduling, ensure that your containers declare resource limits and requests clearly.

For the data analytics application mentioned above, we used Kubernetes deployment artefacts at the pod and namespace level to restrict memory and CPU core usage. Declaring requirements like CPU usage helped us effectively procure resources from the cluster nodes and ensure hassle-free scheduling.

10. Automate the cluster creation process

It is usually easy for small teams to build containers for simple applications and then deploy these on the cloud or on-premises. But when multiple teams work on complex applications, management can become an issue. To isolate resources across applications, we recommend using smaller clusters instead of leveraging one large cluster. This makes application and infrastructure management simple and hassle-free. Automating the cluster creation process helps create smaller clusters seamlessly. We recommend leveraging single-click deployment scripts to set up Amazon EKS or AKS clusters with best practices for availability, monitoring, and security. These scripts can also be integrated with automation tools like Jenkins, enabling your DevOps teams to quickly create clusters for application onboarding.

Leveraging automated Kubernetes scripts, we helped a US-based Fortune 100 credit card company save ~50 hours of set up time per cluster.

Containerization is rapidly gaining traction as it helps enterprises shorten their application development and release cycle, while reducing hardware expenses. Impetus Technologies offers ready-to-use enablers and innovative automation levers to accelerate, simplify, and de-risk your containerization initiatives in a cloud-first world. To learn more, get in touch with us today.

Mustufa Batterywala
Senior DevOps Architect
22 Jul 2020

A holistic approach to securing data in a cloud-based data lake

Data-driven decision-making is a key driver for enterprises in their digital transformation journey. Businesses are now switching to scalable, unified data storage repositories like enterprise data lakes, built on cloud storage options such as Amazon Simple Storage Service (S3), Google Cloud Storage, Azure Data Lake Storage (ADLS), and Azure Blob Storage. But while the cloud offers unmatched speed, flexibility, and cost savings, security remains a major concern. This blog delves into the key pillars of cloud security and outlines how a holistic approach can help enterprises protect the confidentiality, integrity, and availability of their data.

Data access

Role-based access control, authentication, and authorization are vital security components of a healthy data lake. We recommend developing fine-grain controls and defining appropriate roles for key tasks – like moving data to cloud storage, deleting data, and accessing metadata.

While building a data lake for a Fortune Global 500 insurance brokerage and risk management company on AWS, we created different storage buckets for raw data, processed data, and consumption layers. We leveraged the cloud’s Identity and Access Management services to restrict access to each of these layers. No individual users had direct access to the raw data bucket – only the service account and ETL tools could copy data to this layer. We also created roles like Power Admin, Data Analyst, and Data Admin, and gave each of them different access permissions to read and write data. Further, to restrict access to underlying tables via Hive and Presto, we configured Ranger policies. Ranger offers easy management capabilities and enables granular control for role-based access at both table and column level.

Data transfer

It is critical to secure data when moving through the network, across devices, and services. Often, this can easily be configured for each storage service through built-in features. We recommend using Standard Transport Layer Security (TLS)/Secure Sockets Layer (SSL) with associated certificates. This allows you to securely upload/download data to the cloud through encrypted endpoints, accessible via the internet and within the Virtual Private Cloud (VPC).

While implementing an AWS data lake for an IoT solutions provider, we created SSL-enabled VPC endpoints to transfer all data to the cloud storage. This ensured that data never moved through the internet, thereby bolstering security. In AWS, we used SSL for communication between on-premise and cloud network, data ingested in the raw layer and intermediate data processing, and BI tools and consumption layer, ensuring end-to-end data security.

Data storage

As a best practice, encryption-at-rest should always be enabled in the cloud. This includes encryption for storage services as well as persistent disk volumes used by compute instances. For implementing encryption-at-rest effectively, we recommend allowing your cloud provider to manage the encryption keys to eliminate the risk of accidental key deletion/loss.

For the insurance brokerage and risk management customer mentioned earlier, we used cloud-managed keys to encrypt data in S3 and EBS. This enabled easy rotation of keys periodically. To further strengthen security of data residing in the raw layer, we used custom PGP encryption keys for Third-party Auditors (TPAs). Each TPA was provided a specific encryption key, which allowed them to send the necessary files in an encrypted format. These files were then decrypted for processing in the data lake using the PGP keys, ensuring fully secure transfers.

Data availability

The cloud is designed to provide high resilience and availability, which means objects are redundantly stored on multiple devices across different facilities. However, this availability is applicable in a specific region, and data is not automatically replicated across different regions.

To create a robust disaster recovery environment for a leading management e-publication, we enabled data replication in a region different from the source storage. This ensured data security, even in the event of a region failure. We also leveraged automated lifecycle management policies for cloud storage, which enabled automated movement of data from one storage tier to another. To meet the customer’s compliance requirements, we specified a retention period of 7 years, after which the raw archive data was automatically moved to cold storage. This helped reduce overall storage costs, and enabled users to seamlessly retrieve data as and when necessary.

You can also ensure high availability and strengthen protection against data loss through versioning, which lets you preserve, retrieve, and restore different versions of an object stored, enabling smooth recovery from human error and application failures.

In conclusion

Securing data in the cloud is a critical business need. Enterprises cannot afford to overlook the myriad security risks that arise while warehousing their data on a cloud-based data lake. With extensive experience in provisioning cloud-based data lakes for large scale enterprises, Impetus Technologies can help secure your data so that you can focus on your business goals with complete peace of mind.