Comparing key features of Amazon MSK and Confluent
Impetus/Resources/Comparing key features of Amazon MSK and Confluent
How to choose the right tool for seamless Kafka services
Apache Kafka is a real-time event streaming platform that helps enterprises gain reliable insights for quick decision-making and improved customer experience. While it meets the enterprise streaming requirements, maintenance and management of Kafka is an overhead. To reduce these overheads, Amazon MSK (Managed service for Kafka) and Confluent Cloud are widely used by enterprises for event streaming with Apache Kafka.
The difference between Amazon MSK and Confluent is that of being cloud-native and cloud-hosted. But this is just the tip of the iceberg. There are many subtle differences that enterprises need to consider while choosing the right service.
This blog will compare Confluent and Amazon MSK to help you understand which fits your requirements best.
Introduced by the creators of Kafka, Confluent Cloud is a simple, scalable, resilient, and secure event streaming platform with pre-built fully managed Kafka Connectors that make it easy to connect to popular data sources and sink instantly.
Amazon Managed Service for Kafka (MSK), which runs on open source versions of Apache Kafka, is a fully-managed service that allows you to produce and consume data, create, update, and delete clusters using your existing setup applications, plug-ins, and tools. In addition, it reduces infrastructure provisioning and monitoring overheads by scaling up/down based on the requirement.
Confluent vs. MSK
Confluent Cloud clusters are self-serving, on-demand, and can be provisioned as
Basic, mainly used for experimentation, early development, and basic use cases
Standard, used for production-ready use cases leveraging an extended feature set and elastic scaling up to 1 GBPS
Dedicated, recommended for production-critical throughput at GBPS+ scale that requires private networking
On the other hand, Amazon MSK comes with two deployment options:
Provisioned, which is required to manage broker instances and storage
Serverless, which automatically provisions and scales compute and storage resources
The section below compares Confluent clusters with MSK (provisioned and serverless variants).
Infrastructure
Creating and managing Kafka clusters can be tedious. Confluent and MSK provide easy-to-set-up infrastructure and allow you to choose from multiple options depending on your requirements so that you can focus on building use cases rather than deployment.
Criteria Group
Confluent Cloud
Amazon MSK – Serverless
Amazon MSK – Provisioned
Deployment options
Supports multi-cloud (AWS, GCP, Azure) and hybrid deployment
AWS-native fully managed streaming service for Kafka
AWS-native partially managed service for Kafka with deployment options for Capacity planning
Pricing
Pay-per-use: Depends on ingress and egress throughput
Pay according to the number of partitions, throughput, and duration
Depends on the number and type of brokers and storage
Scalability
Kafka is highly scalable. However, managing brokers and partitions according to the load is cumbersome. Auto-scaling enables customers to automatically balance the load and take care of idle brokers and storage.
Criteria Group
Confluent
Amazon MSK – Serverless
Amazon MSK – Provisioned
Cluster scaling
Automatic resource allocation to manage consumer lag according to ingress/egress throughput Ingress throughput – up to 50 Mbps/CKU, Egress throughput – up to 150 Mbps/CKU
Auto-scalable resources Ingress throughput – up to 200 Mbps/CKU Egress throughput – up to 400 Mbps/CKU
Auto-scalable storage, but broker count and type need to be scaled manually
Re-balancing
Self-balancing clusters for automated load balancing
Self-managed
After scaling new brokers, partitions need to be reassigned using native Kafka tools
Storage
Infinite data storage available
Up to 250 GB per partition and up to 120 partitions per cluster (Broker and partition count can be increased through support)
Up to 16TB storage per broker and up to 30 brokers per cluster
Retention
Message retention in topics ranges from 1 hour to infinite time 7 days retention for metrics and logs
Message retention 4 hours (can be increased by a support case)
Default retention of new topics for up to 7 days
Operational management
Operational overheads include:
Monitoring your cluster’s vital statistics
Setting alarms for abnormal behavior
Monitoring logs for debugging and analysis, including best practices for cost optimization
Criteria Group
Confluent
Amazon MSK – Serverless
Amazon MSK – Provisioned
Monitoring and logging
Monitoring of multiple cluster-level metrics like throughput, storage, topic, connectors, etc., from the dashboard Free aggregation of key metrics at the topic and cluster level Third-party tools integration like Prometheus, Datadog, Grafana, etc.
Only consumer, topic, and consumer group metrics are available No additional cost required
Free monitoring of basic cluster level Option to configure enhanced broker, topic, and partition-level monitoring at an additional cost Third-party tools integration like Prometheus, Datadog, Grafana, etc.
Updates and bug fixes
Rolling upgrades to the latest stable Kafka version with zero intervention High availability guaranteed with non-disruptive upgrades
Kafka version and upgrade internally managed by AWS
Rolling upgrades to maintain a high availability and support cluster I/O throughout the version upgrade
Tech support
Expert 24×7 enterprise-level Kafka support
General AWS support
General AWS support
Eco-system integration
Core Kafka comprises brokers, topics, logs, partitions, clusters, producers, and consumers. In contrast, the Kafka eco-system consists of Kafka Core, Kafka Streams, Kafka Connect, Kafka REST Proxy, and the Schema Registry.
Criteria Group
Confluent
Amazon MSK – Serverless
Amazon MSK – Provisioned
Connectors
Drag and drop configuration for 130+ pre-built and self-managed Confluent source and sink connectors
Need to be configured using EC2 instances
Can be integrated with community-built connectors using custom plug-ins.
Kafka eco-system
Compatible with fully managed Confluent Schema Registry Provides a fully-managed solution for creating and managing ksqlDB clusters
Compatible with fully-managed AWS Glue Schema Registry
Compatible with fully-managed AWS Glue Schema Registry
Can be integrated with Confluent Schema registry and ksqlDB by installing them on EC2 instances
Comparative analysis: Confluent Cloud vs. Amazon MSK
To put it in perspective, we ran one topic with 100 partitions on both Confluent and MSK for 30 minutes. The following configurations were consistent for both:
Topics and Partitions
Number of topics = 1, number of partitions = 100
Replication factor = 3
Connectors
Source: Confluent Datagen, max tasks = 20
Sink: Confluent S3 sink, max tasks= 50, flush count= 50K
Cluster type
Cluster configuration
Launch time
Ingress throughput average
Egress throughput average
Cost (per hour)
Confluent Cloud (Standard)
Fully managed cluster and connectors
1 min
46.63 MB/s
45.52 MB/s
$17.190
Confluent Cloud (Dedicated)
Number of CKUs=2, Fully managed connectors
3-4 hours
55.06 MB/s
55.05 MB/s
$30.943
MSK Serverless
Source/Sink Connector VM: m5.4xlarge each
5 mins
59.5 MB/s
60 MB/s
$23.39
MSK Provisioned
Cluster broker type: m5. large No. of brokers: 3 (1 per zone) Source/Sink Connector: 4 workers each
MCUs/worker: 4
25-30 mins
66.3 MB/s (Aggregated)
62.7 MB/s (Aggregated)
$8.85
While configuring connectors and launching clusters in Confluent is easy, you need to spend less in MSK to achieve the same throughput,
Launch time for standard Confluent and MSK serverless, both fully managed, is the same (<5 minutes). However, dedicated Confluent clusters take more than 2 hours to launch and about the same time to upscale. In comparison, provisioned MSK takes approximately 30 minutes to launch and about the same time to upscale.
Comparing the time and cost, provisioned MSK, which also has options to configure managed connectors, is a winner.
Which one to choose – Confluent or Amazon MSK?
Benefits of Confluent Cloud
Cloud-agnostic architecture: Users can extend consistent data architecture to multi-cloud, on-premises, or private cloud environments
Fully manageable: Feature-rich platform with built-in connectors and seamless integration with fully manageable Schema registry and ksqlDB
Drag-and-drop UI for an improved experience
Reasons to choose Amazon MSK
Despite Confluent Cloud having myriad benefits, enterprises often choose Amazon MSK over Confluent because of the following reasons:
Enhanced network security: Apache Kafka on Amazon MSK is deployed within your VPC, which ensures Kafka network packets can never go out on the internet. Therefore, enterprises that have security as their primary concern prefer MSK over Confluent.
Seamless integration with AWS services: With most enterprise infrastructure already hosted on AWS, MSK seems a natural choice as it integrates seamlessly with a wide range of AWS services like Glue ETL, Glue Schema Registry, Kinesis, Lambda, etc.
Cost-effective: A comparative analysis between MSK and Confluent revealed that for achieving the same throughput, you can save up to 30% of cost by configuring optimized MSK clusters.
Impetus has helped multiple Fortune 100 companies take advantage of Kafka seamlessly using Amazon MSK. To know how we can help you choose the right tools to achieve your business goals, write to us at inquiry@impetus.com.
Learn more about how our work can support your enterprise
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
__cf_bm
1 day
This cookie is used to distinguish between humans and bots. This is beneficial for the website, in order to make valid reports on the use of their website.
_grecaptcha
1 day
This cookie is used to distinguish between humans and bots. This is beneficial for the website, in order to make valid reports on the use of their website.
_GRECAPTCHA
179 days
This cookie is used to distinguish between humans and bots. This is beneficial for the website, in order to make valid reports on the use of their website.
CONSENT
2 years
Used to detect if the visitor has accepted the marketing category in the cookie banner. This cookie is necessary for GDPR-compliance of the website.
li_gc
179 days
Stores the user's cookie consent state for the current domain.
pa_enabled
1 day
Determines the device used to access the website. Th is allows the website to be formatted accordingly.
rc::a
1 day
This cookie is used to distinguish between humans and bots. This is beneficial for the website, in order to make valid reports on the use of their website.
rc::b
1 day
This cookie is used to distinguish between humans and bots.
rc::d-15#
1 day
This cookie is used to distinguish between humans and bots.
test_cookie
1 day
Used to check if the user's browser supports cookies.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Cookie
Duration
Description
lang
1 day
Remembers the user's selected language version of a website.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Cookie
Duration
Description
_cc_cc
1 day
Collects statistical data related to the user's website visits, such as the n umber of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location , in order to enable media and marketing agencies to structure and understand their target groups to enable customised on line advertising.
_gcl_au
3 months
Used by Google AdSense for experimenting with advertisement efficiency across websites using their services.
ads/ga-audiences
1 day
Used by Google AdWords to re-engage visitors that are likely to convert to customers based on the visitor's on line behaviour across websites.
bcookie
1 year
Used by the social networking service, LinkedIn , for tracking the use of embedded services.
bscookie
1 year
Used by the social networking service, LinkedIn, for tracking the use of embedded services.
demdex
179 days
Via a unique ID that is used for semantic content analysis, the user's n avigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
dpm
179 days
Sets a unique ID for the visitor, that allows third party advertisers to target the visitor with relevant advertisement. This pairing service is provided by third party advertisement hubs, which facilitates real-time bidding for advertisers.
IDE
1 year
Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
lang
1 day
Set by LinkedIn when a webpage contains an embedded "Follow us" panel.
lidc
1 day
Used by the social networking service, LinkedIn, for tracking the use of embedded services.
lpv#
1 day
Used in context with behavioral tracking by the website. The cookie registers the user’s behavior and navigation across multiple websites and ensures that no tracking errors occur when the user has multiple browser-tabs open.
pagead/1p-user-list/#
1 day
Tracks if the user has shown interest in specific products or events across multiple websites and detects how the user navigates between sites. This is used for measurement of advertisement efforts and facilitates payment of referral-fees between websites.
pixel.gif
1 day
Collects in formation on user preferences and/or interaction with web-campaign content - This is used on CRM-campaign -platform used by website owners for promoting events or products.
site/#
1 day
Unclassified.
ssi
1 year
Registers a unique ID that identifies a returning user's device. The ID is used for targeted ads.
u
1 year
Collects data on user visits to the website, such as what pages have been accessed. The registered data is
used to categorise the user's interest and demographic profiles in terms of resales for targeted marketing.
UserMatchHistory
29 days
Ensures visitor browsing-security by preventing cross-site request forgery. This cookie is essential for the security of the website and visitor.
visitor_id#
10 years
Used in context with Account-Based-Marketing (ABM). The cookie registers data such as IP-addresses, time spent on the website and page requests for the visit. This is used for retargeting of multiple users rooting from the same IP addresses. ABM usually facilitates B2B marketing purposes.
visitor_id#-hash
10 years
Used to encrypt and contain visitor data. This is necessary for the security of the user data.
VISITOR_INFO1_LIVE
179 days
Tries to estimate the users' band width on pages with integrated YouTube videos.
w/1.0/cm
1 day
Presents the user with relevant content and advertisement. The service is provided by third-party advertisement hubs, which facilitate real-time bidding for advertisers.
YSC
1 day
Registers a unique ID to keep statistics of what videos from YouTube the user has seen.
yt-remote-cast-available
1 day
Stores the user's video player preferences using embedded YouTube video.
yt-remote-cast-installed
1 day
Stores the user's video player preferences using embedded YouTube video.
yt-remote-connected-devices
1 day
Stores the user's video player preferences using embedded YouTube video.
yt-remote-device-id
1 day
Stores the user's video player preferences using embedded YouTube video.
yt-remote-fast-check-period
1 day
Stores the user's video player preferences using embedded YouTube video.
yt-remote-session-name
1 day
Stores the user's video player preferences using embedded YouTube video.
yt.innertube::nextId
1 day
Registers a unique ID to keep statistics of what videos from YouTube the user has seen.
yt.innertube::requests
1 day
Registers a unique ID to keep statistics of what videos from YouTube the user has seen.
yt.innertube::requests
1 day
Registers a unique ID to keep statistics of what videos from YouTube the user has seen.
ytidb::LAST_RESULT_ENTRY_KEY
1 day
Stores the user's video player preferences using embedded YouTube video.
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
Cookie
Duration
Description
__utm.gif
1 day
Google Analytics Tracking Code that logs details about the visitor's browser and computer.
__utma
2 years
Collects data on the number of times a user has visited the website as well as dates for the first and most
recent visit. Used by Google Analytics.
__utmb
1 day
Registers a timestamp with the exact time of when the user accessed the website. Used by Google Analytics to calculate the duration of a website visit.
__utmc
1 day
Registers a timestamp with the exact time of when the user leaves the website. Used by Google Analytics
to calculate the du ration of a website visit.
__utmt
1 day
Used to throttle the speed of requests to the server.
__utmz
6 months
Collects data on where the user came from, what search engine was used, what link was clicked and what
search term was used. Used by Google Analytics.
_omappvp
11 years
This cookie is used to determine if the visitor has visited the website before, or if it is a new visitor on the
website.
_omappvs
1 day
This cookie is used to determine if the visitor has visited the website before, or if it is a new visitor on the
website.
ab
1 year
This cookie is used by the website’s operator in context with multi-variate testing. This is a tool used to combine or change content on the website. This allows the website to find the best variation /edition of the site.
AnalyticsSyncHistory
29 days
Used in connection with data-synchronization with third-party analysis service.
omVisits
1 day
This cookie is used to identify the frequency of visits and how long the visitor is on the website. The cookie is also used to determine how many and which subpages the visitor visits on a website – this in formation can be used by the website to optimize the domain and its subpages.
omVisitsFirst
1 day
This cookie is used to count how many times a website has been visited by different visitors - this is done
by assigning the visitor an ID, so the visitor does not get registered twice.
pa
1 day
Registers the website's speed and performance. This function can be used in context with statistics and load-balan cing.
ziwsSession
1 day
Collects statistics on the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been read.
ziwsSessionId
1 day
Collects statistics on the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been read.