08 Dec 2020

AI-enabled DevOps: Reimagining enterprise application development

Today, advances in artificial intelligence (AI) and machine learning (ML) have opened up significant application possibilities, from sensor-driven weather prediction to driverless cars to intelligent chatbots. Development teams have enabled these breakthroughs by leveraging automation to rapidly prototype, iterate, and improve applications. As the scale, scope, and complexity of AI use cases increase, DevOps is fast becoming the preferred mode of build and delivery, as it helps reduce the development lifecycle and provides continuous delivery with high software quality.

This article outlines how AI can help DevOps teams better monitor, alert, and resolve issues in production pipelines to drive strategic business benefits, and explores the internal changes needed to ensure enterprise-readiness for AI-enabled DevOps.

The need for AI-enabled DevOps

DevOps engineers manage the development, testing, and operationalization of data platforms by monitoring network stability, availability, and other key metrics. Some of the common challenges faced by DevOps teams include managing multiple libraries and versions of code, factoring in adequate deployment parameters to avoid application failure, and customizing scripts in a short timespan to ensure optimal performance.

Given adequate data, AI models can detect specific patterns in DevOps processes to help identify bottlenecks and address these challenges. AI can also help overcome the limitations of traditional DevOps tools, especially when it comes to monitoring events in development and production workflows. Typically, alerts for monitored events are triggered when a user-set threshold is crossed. An incident is then logged, and the team is alerted.

This reactive approach, however, depends on specific individuals’ response capacity. What’s more, in the case of security monitoring, alarms triggered by such thresholds can generate many false positives. AI algorithms can help address these challenges by proactively identifying patterns and warning teams of potential disruptions before they occur.

Enabling rapid innovation

AI and ML play a key role in accelerating digital transformation across use cases – from data gathering and management to analysis and insight generation. Enterprises that have adopted AI and ML effectively are better positioned to enhance productivity and improve the customer experience by swiftly responding to changing business needs. DevOps teams can leverage AI for seamless collaboration, incident management, and release delivery. They can also quickly iterate and personalize application features via hypothesis-driven testing.

For instance, Tesla recently enhanced its cars’ performance through over-the-air updates without having to recall a single vehicle. Similarly, periodic performance updates to biomedical devices can help extend their shelf-life and improve patient care significantly. These are just a few examples of how AI-enabled DevOps can foster innovation to drive powerful outcomes across industries.

Accelerating innovation with AI on the cloud

DevOps teams can innovate using the next-gen, cost-effective AI and ML capabilities offered by major cloud providers like AWS, Microsoft Azure, and Google Cloud. They offer access to virtual machines with all required dependencies to help data scientists build and train models on high power GPUs for demand and load forecasting, text/audio/video analysis, fraud prevention, etc.

DevOps teams can leverage these capabilities to improve the quality, stability, scalability, and release frequency of enterprise applications. The cloud also makes it easy to collect and analyze large volumes of data to understand user preferences, which can be leveraged by recommendation engines. This helps businesses deliver a smoother, more personalized user experience. In addition, AI-enabled DevOps can strengthen cybersecurity on the cloud by improving data collection, securing models, and analyzing data from IoT sensors and other devices.

Adding value across use cases

Across industries, AI can play a major role in strengthening security and preventing outages/build failures through automated alerting and predictive monitoring. AI-enabled tools can ensure reliable and secure DevOps by improving monitoring capabilities to detect anomalies, bugs, code performance issues, etc.

They can automatically identify vulnerabilities and minimize the risk of security breaches. AI is also the cornerstone of advances in Natural Language Processing (NLP) and Natural Language Generation (NLG), which can be used to effectively document workflows and processes for creating a DevOps playbook. This, in turn, can accelerate and improve the training of DevOps engineers to help boost operational excellence. Chatbots built on NLP frameworks can also facilitate faster communication among engineers to improve customer support.

DevOps for AI-based applications

Enterprises are now deploying AI-based applications for a variety of use cases, including loan applications, customer churn, customer experience, lead generation, sales forecasting, recommendation systems, and risk scoring, among others. To optimize model training results for such use cases in the shortest turnaround time, appropriate compute resources must be allocated on distributed computing platforms. Compliance/regulatory requirements demand additional focus on handling data bias and bolstering model interpretation capabilities.

DevOps teams can meet these requirements using CI/CD, containerized applications, and microservices which support experimentation and enable MLOps. Further, they can harden AI applications by adopting best practices of AI-driven security analytics to significantly improve DevSecOps. These practices can help detect and prevent adversarial attacks, data poisoning, breaches, and system disruptions. This is especially vital for enterprises working in critical infrastructure areas like nuclear energy, oil and gas, and water treatment. They are particularly vulnerable to cybersecurity threats as they use industrial control systems comprising of IoT sensors and safety mechanisms.

Readying the enterprise for AI-enabled DevOps

Many organizations are still in the early stages of digital transformation. They continue to work with legacy systems and have large amounts of historical data in silos. AI can help extract insights from such data for creating well-designed applications to enhance customer experiences. To realize these benefits, organizations should upskill and empower their existing DevOps and data science personnel.

Data science teams may need to be educated about the benefits of adopting strategic DevOps practices like version control for development, model lineage tracking, model training and testing frameworks, etc. These practices can improve incremental feature delivery and enhance personalization by identifying user-specific patterns in application usage and tailoring features accordingly. In addition, DevOps engineers should work closely with data scientists and ML engineers to accelerate response time and efficiently track and manage all aspects of model development and production.

Close collaboration can also help ML engineers initiate model retraining and manage model versions using CI/CD and containerized applications, as part of MLOps. While this is an ambitious undertaking, it can help improve key metrics across the DevOps lifecycle – from idle time to mean time to repair (MTTR) to release frequency. Achieving desired metrics through AI-enabled workflows that continuously learn and improve performance can help enterprises produce world-class applications while saving cost.

Since orchestration and monitoring form the backbone of DevOps, AI offers myriad opportunities to automate operations and deliver real-time insights for improving product development and releases with quality and efficiency. As both AI and DevOps become more mainstream, enterprises will increasingly break organizational silos and adopt new automation-led tools and strategies to improve business outcomes.

Reprinted with permission from Datanami

Ravishankar Rao Vallabhajosyula
Director, Data Science
16 Aug 2018

New Approaches to Real-time Anomaly Detection for Streaming Data

Detecting anomalous patterns in data can lead to significant actionable insights in a wide variety of application domains. Be it detecting roaming abuse and service disruptions in the telecom industry, identifying anomalous employee behavior that signals a security breach, or preventing out-of-pattern medical spends in incoming health insurance claims; anomaly detection has innumerable possibilities.

Anomaly detection has traditionally been driven using rule-based techniques applied to static data processed in batches, which makes it difficult to scale out as the number of scenarios grow. Modern data science techniques are far more efficient. Complex machine learning models can now be built using large amounts of unstructured and semi-structured data from disparate sources including business applications, emails, social media, chat messages, voice, text, and more. Moreover, the massive increase in streaming time-series data is leading to a shift to real-time anomaly detection, creating a need for techniques such as unsupervised learning and continuous models.

Following are some examples of how leading enterprises are using real-time anomaly detection to gain deeper insights and to swiftly respond  to a dynamic environment:

Real-time Anomaly Detection

Real-time Anomaly Detection Use Cases Across Verticals

A shift in anomaly detection techniques

Real-time anomaly detection for streaming data is distinct from batch anomaly detection. Streaming analytics calls for models and algorithms that can learn continuously in real-time without storing the entire stream, and are fully automated and not manually supervised. Even though both supervised and unsupervised anomaly detection approaches have existed, most anomaly detection methods are for batch data processing, that does not fit real-time streaming scenarios and applications.

Moreover, detecting anomalies accurately in streaming data can be difficult; the definition of an anomaly is continuously changing as systems evolve and behaviors change. Furthermore, because anomalies are unexpected, an efficient detection system must be able to determine whether new events are anomalous without relying on preprogrammed thresholds.

Another critical aspect is early detection of anomalies in streaming data, as the focus lies in not only identifying anomalies but predicting and curbing anomalous events in real-time. Thus, predictions must be made online, where the algorithm identifies anomalies before incurring the actual event, unlike batch processing where the model is trained to look back.

A new approach to effective and reliable anomaly detection

One way to implement new approaches to anomaly detection is via hand-coding everything from scratch. However, developing a custom solution from scratch, with the shift to real-time anomaly detection which is significantly more complex has its own set of challenges like:

  • Long implementation cycles
  • Finding the right talent
  • Multiple QA cycles
  • Continuous monitoring and option to scale up with increasing loads once developed

Another approach is the platform approach to anomaly detection. Imagine a platform that would solve the complexities of not only building anomaly detection models for streaming data but provide a unified solution to train, calibrate, deploy and enable post-production monitoring of models, on both real-time and batch data.

StreamAnalytix is one such real-time anomaly detection platform. It is a specialized platform to rapidly build, run, and continually update anomaly detection models using a visual UI and machine learning capabilities. It leverages open source engines like Apache Spark to create analytics applications at scale and has a drag-and-drop interface to build and manage your application workflows visually.

It is an integrated framework not just to create models but also provide end-to-end functionality to build enterprise anomaly detection applications. It perfectly maps to the modern platform approach to anomaly detection by exposing features like:

  • Real-time data integration and processing
  • Rapid development and operationalizing applications
  • A/B testing
  • Monitor, debug, and diagnose at scale
  • Version management
  • Promoting workflows to different environments: Dev-Test-Prod
  • Multi-tenancy

To further get an in-depth view of real-time anomaly detection and the new platform approach to it, download this whitepaper Guide to Real-time Anomaly Detection for Enterprise Data.

Ravishankar Rao Vallabhajosyula
Director, Data Science