Blog
06 Feb 2015

We have ARRIVED!

Hello World!

We have ARRIVED!

February 17, 2015 is a date that everyone at StreamAnalytix will treasure. It is the day when StreamAnalytix announced its General Availability! This milestone for StreamAnalytix comes after strong validation from our Beta customers and equally strong support from our growing eco-system of technology partners.

The competitive landscape for streaming analytics platforms is growing every week with new entrants announcing products or funding, but our experience with our customers and their feedback informs us that StreamAnalytix stands tall and distinct.

To start with, StreamAnalytix is the ONLY commercially supported and viable streaming analytics platform today based on Open Source technologies. Our value proposition of “Enterprise class on Open Source” is being strongly validated by customers acknowledging how far our platform takes them beyond just a set of Open Source components stitched together. Some of these value-adds and features that customers appreciate are:

  • Rapid, easy application development and deployment with a series of pre-built operators
  • Visual monitoring of real-time streaming applications with performance based alerts
  • Complex event processing integrated on streaming data (get going with no coding or scripting)
  • Seamless integration modern data platforms
  • Integration of a powerful real-time dash-boarding engine to visualize streaming data

All of these, along with the intuitive and powerful user interface makes it really easy and fast for enterprises to go live with their much-awaited real-time stream processing use-cases in a matter of days or at most a few weeks if there is significant custom application development to be done. The acceleration is so tangible and valuable, and add to that… everything underneath is familiar and proven Open Source technology – that enterprises are finding it hard to say NO to StreamAnalytix. We have also decided to bring in support for Spark streaming later this year so that customers don’t need to choose between Storm and Spark and they have both options supported when they go with our platform.

The use cases we are enabling in various projects are in areas including Internet-of-Things (IoT), sensor data analytics, e-commerce and Internet advertising, security, fraud, insurance claim validation, credit-line-management, call center analytics, and log analytics. Additionally, a common pattern that we are noticing is enterprise IT and business transformation with our Streaming ETL capability that speeds up slow batch processes to near-real-time. We will also announce a partner in the streaming ETL domain soon.

We have certified our product with MapR and Hortonworks with a third, i.e., Cloudera coming soon. The certifications help but it is important to note that we work with any modern data platforms quite naturally simply because of the way we have built our data abstraction layer. It’s the same reason why we work seamlessly with other NoSQL databases like Apache Cassandra. We have even integrated with Mark Logic, another commercial NoSQL database – in about two weeks for a specific customer.

All things considered (including cost) – we feel StreamAnalytix is really the best choice for enterprises wanting to develop and deploy real-time streaming applications quickly and have their batch, speed and service layers tightly integrated. We will be glad to engage, show demos, and have a Q&A session with you if you are considering or evaluating streaming analytics platforms.

Author
Anand Venugopal
Asst. Vice President and Global Business Head - StreamAnalytix
Blog
25 Aug 2015

Experience StreamAnalytix in action first-hand and for free!

We are excited about making the power of StreamAnalytix available to a wide audience by way of our Free versions launch today. Click the download button to see what we have for you. It is truly immense value and power we are making available at no cost. For example – the popular problem of streaming ingest into NoSQL plus Indexing is solved with no need to write any code – if you use StreamAnalytix Lite. We want to reach everybody who is using or considering Apache Storm currently to benefit from our current mission of making streaming analytics applications quick and easy to develop and deploy! And for users of Spark Streaming or other streaming engines – our stated commitment is to support Spark Streaming in an upcoming release (to be announced in Q4 this year) in our effort to future proof your choice of streaming technologies.

For more information about the free versions of StreamAnalytix, and to download the software, please visit: http://streamanalytix.com/download.

Author
Anand Venugopal
Asst. Vice President and Global Business Head - StreamAnalytix
Blog
24 Dec 2015

StreamAnalytix Releases Version 1.2

We’re happy to announce the release of StreamAnalytix 1.2!

In this release, we have taken a big leap towards developer enablement with the vision to minimize custom-coding. Developers of streaming analytics applications have a refreshing new UI look-and-feel and a rich set of pre-built operators, stream processing functions, and convenience features like versioning of applications/ topologies, roll-back to a previous version and the ability to add and share custom operators/ libraries making them available to the rest of the team to re-use in a modular manner. We are quite excited about this release and feel that we have certainly created a very compelling reason to Buy vs. Build with all the additions in this release.

We encourage you to download the Sandbox and experience the power of StreamAnalytix 1.2 and send us your feedback.

What’s Next

We’ve got big things in store, and we know that this journey is far from over, in fact its just beginning. Next, we’re working hard on StreamAnalytix 2.0 – the next version that will consolidate all our platform features for Storm and make them available across streaming engines by providing a level of abstraction above the open source engine to future proof you from the rapidly evolving technology options. Stay tuned!

Author
Anand Venugopal
Asst. Vice President and Global Business Head - StreamAnalytix
Blog
22 Jul 2016

Experience the Power of Spark Streaming with StreamAnalytix Free Cloud Trial

Real-time Streaming Analytics Made Easy!

We are happy to announce the Cloud Trial of real- time streaming applications development platform, StreamAnalytix. We are excited about bringing the experience and power of Apache Spark Streaming application development available to a wide audience by way of our platform’s cloud instance. By making available StreamAnalytix on Cloud and for free, our intent is to fuel the growth of the streaming analytics community. This will provide a way to new generation developers, data scientists, and data engineers to build their analytical applications quickly in just few clicks and get them up and running in minutes.

StreamAnalytix Cloud Trial is a fully functional version of StreamAnalytix for everyone to get started fast on Spark Streaming. Hit a URL and you’re good to go. No hardware problems or setup worries. It provides a powerful medium to quickly evaluate StreamAnalytix for your enterprise needs or a proof-of-concept(POC); build analytical applications for your learning and research purposes. It comes fully loaded with features such as built-in Scala Processor, out-of-the-box Data Generator, advance Data Transformation and Analytical Operators, and built-in Data Visualization tool. With this launch, we have also introduced new online learning material to make you familiar with new product features, and help you get started with building your application.

To experience the power of StreamAnalytix for Spark Streaming, sign up for your free trial now!

Author
Anand Venugopal
Asst. Vice President and Global Business Head - StreamAnalytix
Blog
28 Aug 2017

Structured Streaming, Simplifying Building Stream Analytics Applications

Last week the StreamAnalytix team hosted a webinar on Structured Streaming, “The Structured Streaming Upgrade to Apache Spark and How Enterprises Can Benefit” and received overwhelming participation from the industry, including many of you reading this. Amit Assudani (Sr. Technical Architect – Spark, StreamAnalytix) and I took a deep dive into Structured Streaming and shared our views on how it enables the real-time enterprise and is simplifying building stream processing applications on Spark. Here is a summary of our current view on Structured Streaming-

Need for Structured Streaming

Open source engines such as Apache Storm, Apache Spark and Apache Flink have made it possible to build apps for fault-tolerant processing of real-time data streams. However, building robust stream processing applications is still hard, and involves various complexities to be considered. The biggest complexity is data itself – for instance, there are various formats of data, it needs to be cleansed, it can come in at different speeds, can become corrupt, and needs seamless integration with other data in external storage. And then, streaming applications don’t work in isolation and usually involve other workloads such as interactive queries, batch workloads, and machine learning on top of streaming.

Apache Spark evolution has been in the context of addressing these complexities. Since its release, Spark Streaming has become one of the most widely used distributed streaming engines. As per a Nov’16 survey by Taneja Group, nearly 54% of the 7000 enterprise respondents said they are actively using Spark, and 65% plan to increase the use of Spark in the next 12 months. Top use cases include ETL, Real Time Processing, Data Science and Machine Learning. This concurred with the results from the poll we conducted during the webinar, where 53% of ~200 attendees said they were already using Spark and 32% are planning to use it in the near future.

Structured Streaming, experimentally introduced in Spark 2.0, is designed to further simplify building stream processing applications using Spark. It is a fast, scalable, and fault tolerant stream processing engine built on top of Spark SQL, and provides unified high level APIs dealing with complex data, workloads and systems. It comes with a growing ecosystem of data sources that allow it to integrate streaming applications with evolving storage systems.

What Structured Streaming brings in?

Structured Streaming works with the premise that seamlessly building stream processing applications normally requires strong ‘reasoning’ (i.e. worrying about and designing mechanisms) for end to end guarantees, intermediate aggregates and data consistency. The philosophy it follows is – to effectively perform stream processing, the developer should not have to reason about streaming at all. As an end user you shouldn’t have to reason about what happens when the data is late or system fails. Structured streaming provides strong guarantees about consistency with batch jobs, it takes care to process data exactly once and update output sinks regularly.

New concepts like ‘Late Data Handling and Watermarking’ enable these guarantees. Structured Streaming allows handling of delayed data by maintaining intermediate state for partial aggregates allowing late data to update aggregates of old windows correctly. These time windows can be defined by watermarking the time interval till when the late data is allowed to update aggregates. Another feature is ‘Event Time’; earlier Spark only considered the time when data entered the system and not the actual event time. Now, Structured Streaming allows aggregates and windows to be updated based on event time, and these aggregates are maintained by Spark itself.

One of the biggest functionalities Structured Streaming brings is that it simplifies event-time stream processing that works on both batch and streams. This was not possible on DStreams, the real time processing API in the earlier version of Spark. To do this Spark has a new model, a new way to treat streams, TABLES. Except this table is an, append only, unbounded table. Streams are treated as conceptual tables, unbounded and continuously growing. In actual execution, the unbounded table is an incrementalised query, in a way allowing a single dataset data frame API to deal with both static table and an unbounded table. As new data is coming in stream there are new rows added to this table thus unifying both batch and streaming data by the single concept of tables.

To explain it further it can be said that Structured Streaming allows the developer to write the business logic in the code once, conveniently apply it to batch data or streaming data, having to change only a single line or two of code. For instance you want to write a batch query using data frames, you will have a simple code, to convert it to streaming data all you have to do is change ‘read’ to ‘read stream’ and ‘write to ‘write stream’, but the actual query, the business logic does not change, the code remains the same. Essentially, Structured Streaming converts periodic batch jobs to a real-time data pipeline, converting your batch like query and automatically incrementalise it so that it operates on batches of new data.

With Spark 2.2, Structured Streaming moves out of the experimentation phase. This version marks Structured Streaming as production ready, losing the experimental tab, its stable API will be able to support all future versions of Spark.

But let’s also see what Structured Streaming is not? Structured Streaming is not a very big change to Spark itself. It is only a collection of additions to Spark Streaming retaining the fundamental concept of micro batching at the core of Spark’s streaming architecture. It enables users to continually and incrementally update the view of the world as new data arrives, while still using the same familiar Spark SQL abstractions. It has maintained a tight integration with the rest of Spark and supports serving interactive queries on streaming state with Spark SQL and integrates with MLlib.

Though a big leap towards true streaming, Structured Streaming is not there yet when it comes to use cases requiring 10-20 ms turnaround time. You still need to care about true event streaming engines like Storm and Flink. But there is a promise to move in that direction, using the same code with a dramatically different engine with plans to support use cases with latency as low as 1 ms. This move will further revolutionize the use of Spark for building true event time processing applications.

You can access our webinar to take a deeper dive into the functionalities of Structured Streaming, features and highlights, mid-to long term outlook, and the challenges that still persist.  Look out for some very interesting questions from the audience that we answered live. We also got very encouraging feedback and look forward to bringing you all more such content in the future.

Author
Anand Venugopal
Asst. Vice President and Global Business Head - StreamAnalytix
Blog
14 Nov 2017

StreamAnalytix Lite, a Visual IDE to Build, Test and Run Apache Spark Applications on Your Desktop for Free

Apache Spark is growing, but shortage of Spark skills is a significant adoption barrier

Apache Spark is one of the most popular unified analytics engines for data processing today. It has moved beyond the early-adopter phase and is now mainstream in large data-driven enterprises. Our customers, primarily Fortune 500 companies, are looking at Spark for all data processing tasks. These range from ingest through ETL and data quality processing to advanced analytical tasks and machine learning jobs.

Even though Spark’s popularity has grown significantly, unavailability of Spark talent is impacting a wider and deeper adoption of Spark.

Why Apache Spark skills are not growing at the same pace

With the rapidly changing technology landscape, Spark itself is evolving, and developers and enterprise IT teams can find it challenging to keep up with the pace.

Moreover, though Spark’s open source availability provides an easily accessible platform for experimentation, it demands a steep learning curve. It also requires a lot of development, integration and testing time to write code and solve the complexities of building Spark-based production-ready applications.

Simplifying Apache Spark is the answer

StreamAnalytix Lite provides a solution to the complexities involved in building enterprise-grade applications on Spark for both batch and streaming mode. It is a free, quick-start lightweight product, which anybody can download and use to accelerate their Spark learning and usage.

A Spark only version of StreamAnalytix (an open-source enabled, enterprise-grade, stream processing and machine learning platform), StreamAnalytix Lite offers the same powerful visual interface that dramatically increases developer productivity by providing ready-to-use operators to select, drag-and-drop, connect, and configure to realize a fully functional Spark pipeline.

It can receive data from a wide array of local data sources and data targets and offers all the advanced analytics and machine learning capabilities of StreamAnalytix on a single instance.

StreamAnalytix Lite offers the following key features:

A GUI-based development and operations tool:

Use it to learn, experiment, develop, and put Spark applications into production

A full portfolio of debugging, administration, and monitoring functions:

Build, test, run, and manage Spark applications end-to-end

Extremely easy to work:

Lightweight with 2GB on disk, can be downloaded onto your Windows, Mac, or a Linux desktop or a server node

No need for coding, yet enables custom logic:

Comprehensive set of pre-built tools and drag and drop operators

A web-based tool with powerful multi-tenancy features:

Allows multiple users to connect to a single node

StreamAnalytix Lite Interface

It offers a powerful visual interface with a pipeline designer for rapid application development and built-in dashboards for real-time data visualization.

Detailed list of in-built Apache Spark Operators in StreamAnalytix Lite

Operators include an array of data sources, processors, analytical operators, and emitters

Recommended usage of StreamAnalytix Lite

Though StreamAnalytix Lite makes Apache Spark development easy, it is not recommended as an execution platform for production applications. For this, pipelines built on StreamAnalytix Lite can be seamlessly exported to the production-grade (eEnterprise) edition of the StreamAnalytix platform to run at full enterprise scale in production on multi-node Spark clusters.

Support for StreamAnalytix Lite

Developers who may need some help or support can also consult with StreamAnalytix experts and a forum of peer-developers on the web-based community portal for this tool. Similarly, experienced developers can also contribute their knowledge or sample pipelines. Click here to reach Support Forum.

To download StreamAnalytix Lite, please click this linkTo know more about StreamAnalytix Lite, visit the  StreamAnalytix Lite page.

Author
Anand Venugopal
Asst. Vice President and Global Business Head - StreamAnalytix
Blog
12 Feb 2018

Low-code Application Development Can Drive Higher Apache Spark Adoption in the Enterprise

Apache Spark adoption is growing but the complexities remain

Apache Spark has moved beyond the early-adopter phase and is now mainstream. Large data-driven enterprises are looking at Spark for all data processing tasks ranging from ingest through ETL and data quality processing to advanced analytics and machine learning jobs.

However, despite its growing popularity, Spark is still evolving. Along with a steep learning curve, developers need time to develop, integrate, and test code on Spark to solve the underlying complexities.

Moreover, building functionally rich Spark applications requires integration with a wide array of data sources and data targets (like multiple, disparate live data sources such as Kafka, HDFS, Hive, RabbitMQ, Amazon S3 and more), data processors, advanced analytics, and machine learning tools.

Hence, it might be difficult for developers and enterprise IT teams to keep up with the evolving analytics landscape and the complexities of using Spark.

Low-code development abstracts Apache Spark complexities

A visual low-code tool is a solution to the complexities involved in building enterprise-grade Spark applications. A low-code platform enables visual work-flows instead of manual programming to reduce the time to develop and operationalize applications. It also helps to visualize an application’s data sources, data preparation, business logic, and third-party interfaces. This approach can empower a range of users, from developers to business users and can create efficiencies to the extent of 10x vs. hand-coding Spark pipelines.

StreamAnalytix Lite is one such lightweight development tool that simplifies Spark application development. It is a Spark-only version of StreamAnalytix – an enterprise-grade real-time analytics and machine learning platform. Available free for download, it enables developers to build production-ready functionally-rich Spark applications with the aid of an intuitive drag-and-drop user interface and a wide array of pre-built Spark operators.

Learn how you can use StreamAnalytix Lite to simplify Spark application development

Features of a low-code development tool

  • An abstraction layer to simplify the use of complex technologies: The underlying infrastructure of the development platform must be well-tuned to help you focus on the business logic. For instance, StreamAnalytix Lite provides a layer of abstraction for Spark and a comprehensive set of data sources and data targets (like Kafka, HDFS, Hive, RDBMS, Rabbit MQ, Azure Event-hub, Amazon Kinesis, Amazon S3, and ElasticSearch), a set of data processors, and an array of advanced analytics and Spark machine learning tools like Spark MLib, ML, PMML, TensorFlow, and H2O.
  • Visual elements: Low-code development Spark platforms offer a compelling visual interface that dramatically increases the developer’s productivity by providing ready-to-use operators to select, drag-and-drop, connect, and configure. StreamAnalytix Lite provides a visual Spark pipeline designer, monitoring and debugging tools, and built-in real-time dashboards to support rapid Spark application development and faster time to deployment.
  • End-to-end application lifecycle management: Low-code development platforms are not only focused on application development, but they must also provide an Integrated Development Environment (IDE) to support the entire application delivery lifecycle. StreamAnalytix Lite seamlessly moves applications along the lifecycle from design, build, test, and deploy to manage on a single node. Apart from the visual development tools, the StreamAnalytix Lite also includes a one-click deployment option, application governance tools (such as data inspect and data lineage), and an option to scale out on multiple clusters using StreamAnalytix.
  • Extensible: Though an easy to use drag-and-drop UI considerably accelerates time to application development, the demand for custom applications has never been higher. The platform must minimize hand-coding, yet should enable integration of hand-written custom logic into your Spark pipelines easily. StreamAnalytix Lite supports SQL queries over Spark streaming as well as on your static data store along with inline support for languages and tools like Java, Scala, and MVEL.

Simplifying Apache Spark can drive higher adoption in the enterprise

Visual low-code development tools are accelerating the pace of software development. Continued innovations are bringing unprecedented levels of usability and power to these platforms.

Hand-coding and deploying a functionally-rich production-ready Spark application might take months. With a low-code Spark platform, you can deliver an application with more flexibility within few weeks with only 30% of your team at a fraction of the estimated cost.

Platforms like StreamAnalytix Lite also address the shortage of Spark talent. With minimal coding requirements, the existing teams can dramatically increase their Spark usage and productivity and support existing Spark initiatives.

Also, the use of AI in low-code development platforms is emerging as a disruptive trend. Low-code Spark platforms are taking the abstraction of coding to a level that is enabling enterprises to develop AI-supported, model-driven approaches to software development giving developers an auto-build capability for complex process logic to application construction.

Adoption of low-code platforms is poised to increase, as more and more enterprise IT teams become faster and more flexible in using Spark and deliver enterprise applications with little or no hand coding. Business users will also start leveraging these platforms to build functional applications without having to write a single line of code. New AI driven approaches and future innovation will make these platforms more declarative to the business and will pave the roadmap for the future of these solutions.

You can download StreamAnalytix Lite to start building Spark applications within minutes. You can also sign up here for your free trial of the enterprise version of StreamAnalytix.

Author
Anand Venugopal
Asst. Vice President and Global Business Head - StreamAnalytix
Blog
06 Apr 2018

Data Trends for 2018

Data is a powerful corporate asset that enterprises are now beginning to fully harness. Enterprises are looking to derive breakthrough value through investments in cloud-migration, data lakes, in-memory computing, modern business intelligence, and data science technologies.

Following predicts represent views of multiple Fortune 500 companies that are actively investing to transition into future-ready data-driven real-time enterprises in 2018.

Big Data Trends for 2018

Author
Anand Venugopal
Asst. Vice President and Global Business Head - StreamAnalytix