Impetus/Resources/Data mesh: Redefining modern data architecture
The rapid scale of cloud adoption and digital transformation has spearheaded a massive change in the present technology landscape. Self-service tools, cloud-native applications, and data-driven technologies are redefining the traditional data stack. Within this landscape, the data mesh is fast emerging as a revolutionary paradigm for new-age analytics architecture.
It is a modern architecture approach based on microservices, distributed ownership, domain-based design, and more. It helps enterprises easily access and query their data without transporting it to a data lake or warehouse. The data mesh decentralizes data ownership to domain-specific teams that can manage, own, and serve data seamlessly.
Why data mesh is significant?
To understand the need for a data mesh, let’s take a deeper dive into the evolution of data architecture over the past few decades.
The first generation of data architecture was built around an enterprise data warehouse, multiple relational databases, and standalone business intelligence platforms. ETL jobs were manually executed, and BI reports were generated with insights for business stakeholders.
Eventually, enterprises transitioned into Hadoop-based data lakes that unified an organization’s relational databases under a single umbrella, enabling easier querying from large datasets and greater visibility into enterprise data.
In recent years, the need for real-time analytics has given rise to a modern data architecture paradigm based on stream processing, cloud-based data lakes, and BI tools. However, for many enterprises, architectural limitations continue to pose challenges like:
Ever-growing data sources and volumes make it difficult to scale centralized data platforms
Monolithic, domain-agnostic data platforms often have high failure rates
Coupling pipeline architecture for ingestion, cleansing, aggregation, serving, etc. is complex
Delivering consumption-ready data requires data engineers with niche expertise
Enterprises can leverage the following four key principles to address these challenges with a data mesh:
Domain-oriented ownership and architecture: Decentralizes data ownership and transfers this to domain teams most familiar with specific datasets/use cases. Each domain team manages processes like data ingestion, cleansing, and transformation, enhancing data agility and scalability.
Data as a product: Applies product thinking to datasets, encouraging developers to consider the end-users as “product customers” instead. This makes them responsible for maintaining quality across the entire lifecycle, right from product creation to maintenance.
Self-service infrastructure: Rests on an underlying common platform and set of easy-to-use, self-service tools that can be used regardless of technical skill sets. This enables domain teams to build and maintain data products independently, rather than relying on a centralized IT team.
Federated computational governance: Sets metadata and documentation standards that each domain can implement for their data products while enabling teams to combine and share independent data products securely.
Fig. 1: A high-level overview of a data mesh-based architecture
Is your organization ready for the data mesh?
While the data mesh seems to be an ideal solution for all types of data platform architecture, it is not feasible for all use cases from an implementation, deployment, and management perspective. For those considering this methodology, here are some key questions to determine the path forward:
Q: Is data mesh recommended for enterprises of all sizes?
A: It is more suited for enterprises with massive-scale data management needs.
Q: Can data mesh be implemented on-premises?
A: It is more suited for a cloud setup as it requires huge infrastructure along with ubiquitous monitoring, governance, and security.
Q: Are any specialized tools or frameworks needed to implement the data mesh?
A: Since the data mesh is an architectural approach, data architects and engineers can use the company’s existing cloud services, tools, and frameworks for implementation. There is no need for any new investment.
Q: Can a data lake or data warehouse form a part of the data mesh architecture?
A: Yes, data lakes and warehouses act as nodes within the data mesh architecture. In a typical data mesh setup, core processes like ingestion, processing, and pipelining are self-service/automated, with the data lake/warehouse working in a domain-bounded context.
Q: Does data mesh work for multi-cloud and hybrid cloud setups?
Data mesh infrastructure (including the self-service platform) needs to be built on a highly available, scalable, and cost-optimized computing backbone. Therefore, it works better with a multi-cloud setup, where the dependency is not on a single cloud provider. However, in certain cases, it can also be implemented for a hybrid cloud environment, depending on enterprise business needs.
Q: Is data mesh architecture complex? Does it require niche expertise?
A: Laying the foundation for the data mesh requires specialized expertise in the initial stages, as any oversights can have a cascading effect on the complexity of maintenance and operations.
Enterprise architects need to carefully evaluate the need for a data mesh based on their existing technology architecture, use cases, and business goals. Here is a step-by-step flow to help you assess your readiness:
Fig. 2: A step-by-step flowchart to assess data mesh readiness
A data mesh approach can help enterprises move away from monolithic data architecture, break down silos, and enable analytics at scale. It may also help significantly reduce operational and storage costs. However, the data mesh is not a “one-size-fits-all” solution to address all data platform challenges. Its merits need to be carefully weighed against those of unified data architectures (like a data lakehouse powered by cloud services).
To learn more, get in touch with our cloud and data engineering experts today.
Learn more about how our work can support your enterprise
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
__cf_bm
1 day
This cookie is used to distinguish between humans and bots. This is beneficial for the website, in order to make valid reports on the use of their website.
_grecaptcha
1 day
This cookie is used to distinguish between humans and bots. This is beneficial for the website, in order to make valid reports on the use of their website.
_GRECAPTCHA
179 days
This cookie is used to distinguish between humans and bots. This is beneficial for the website, in order to make valid reports on the use of their website.
CONSENT
2 years
Used to detect if the visitor has accepted the marketing category in the cookie banner. This cookie is necessary for GDPR-compliance of the website.
li_gc
179 days
Stores the user's cookie consent state for the current domain.
pa_enabled
1 day
Determines the device used to access the website. Th is allows the website to be formatted accordingly.
rc::a
1 day
This cookie is used to distinguish between humans and bots. This is beneficial for the website, in order to make valid reports on the use of their website.
rc::b
1 day
This cookie is used to distinguish between humans and bots.
rc::d-15#
1 day
This cookie is used to distinguish between humans and bots.
test_cookie
1 day
Used to check if the user's browser supports cookies.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Cookie
Duration
Description
lang
1 day
Remembers the user's selected language version of a website.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Cookie
Duration
Description
_cc_cc
1 day
Collects statistical data related to the user's website visits, such as the n umber of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location , in order to enable media and marketing agencies to structure and understand their target groups to enable customised on line advertising.
_gcl_au
3 months
Used by Google AdSense for experimenting with advertisement efficiency across websites using their services.
ads/ga-audiences
1 day
Used by Google AdWords to re-engage visitors that are likely to convert to customers based on the visitor's on line behaviour across websites.
bcookie
1 year
Used by the social networking service, LinkedIn , for tracking the use of embedded services.
bscookie
1 year
Used by the social networking service, LinkedIn, for tracking the use of embedded services.
demdex
179 days
Via a unique ID that is used for semantic content analysis, the user's n avigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
dpm
179 days
Sets a unique ID for the visitor, that allows third party advertisers to target the visitor with relevant advertisement. This pairing service is provided by third party advertisement hubs, which facilitates real-time bidding for advertisers.
IDE
1 year
Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
lang
1 day
Set by LinkedIn when a webpage contains an embedded "Follow us" panel.
lidc
1 day
Used by the social networking service, LinkedIn, for tracking the use of embedded services.
lpv#
1 day
Used in context with behavioral tracking by the website. The cookie registers the user’s behavior and navigation across multiple websites and ensures that no tracking errors occur when the user has multiple browser-tabs open.
pagead/1p-user-list/#
1 day
Tracks if the user has shown interest in specific products or events across multiple websites and detects how the user navigates between sites. This is used for measurement of advertisement efforts and facilitates payment of referral-fees between websites.
pixel.gif
1 day
Collects in formation on user preferences and/or interaction with web-campaign content - This is used on CRM-campaign -platform used by website owners for promoting events or products.
site/#
1 day
Unclassified.
ssi
1 year
Registers a unique ID that identifies a returning user's device. The ID is used for targeted ads.
u
1 year
Collects data on user visits to the website, such as what pages have been accessed. The registered data is
used to categorise the user's interest and demographic profiles in terms of resales for targeted marketing.
UserMatchHistory
29 days
Ensures visitor browsing-security by preventing cross-site request forgery. This cookie is essential for the security of the website and visitor.
visitor_id#
10 years
Used in context with Account-Based-Marketing (ABM). The cookie registers data such as IP-addresses, time spent on the website and page requests for the visit. This is used for retargeting of multiple users rooting from the same IP addresses. ABM usually facilitates B2B marketing purposes.
visitor_id#-hash
10 years
Used to encrypt and contain visitor data. This is necessary for the security of the user data.
VISITOR_INFO1_LIVE
179 days
Tries to estimate the users' band width on pages with integrated YouTube videos.
w/1.0/cm
1 day
Presents the user with relevant content and advertisement. The service is provided by third-party advertisement hubs, which facilitate real-time bidding for advertisers.
YSC
1 day
Registers a unique ID to keep statistics of what videos from YouTube the user has seen.
yt-remote-cast-available
1 day
Stores the user's video player preferences using embedded YouTube video.
yt-remote-cast-installed
1 day
Stores the user's video player preferences using embedded YouTube video.
yt-remote-connected-devices
1 day
Stores the user's video player preferences using embedded YouTube video.
yt-remote-device-id
1 day
Stores the user's video player preferences using embedded YouTube video.
yt-remote-fast-check-period
1 day
Stores the user's video player preferences using embedded YouTube video.
yt-remote-session-name
1 day
Stores the user's video player preferences using embedded YouTube video.
yt.innertube::nextId
1 day
Registers a unique ID to keep statistics of what videos from YouTube the user has seen.
yt.innertube::requests
1 day
Registers a unique ID to keep statistics of what videos from YouTube the user has seen.
yt.innertube::requests
1 day
Registers a unique ID to keep statistics of what videos from YouTube the user has seen.
ytidb::LAST_RESULT_ENTRY_KEY
1 day
Stores the user's video player preferences using embedded YouTube video.
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
Cookie
Duration
Description
__utm.gif
1 day
Google Analytics Tracking Code that logs details about the visitor's browser and computer.
__utma
2 years
Collects data on the number of times a user has visited the website as well as dates for the first and most
recent visit. Used by Google Analytics.
__utmb
1 day
Registers a timestamp with the exact time of when the user accessed the website. Used by Google Analytics to calculate the duration of a website visit.
__utmc
1 day
Registers a timestamp with the exact time of when the user leaves the website. Used by Google Analytics
to calculate the du ration of a website visit.
__utmt
1 day
Used to throttle the speed of requests to the server.
__utmz
6 months
Collects data on where the user came from, what search engine was used, what link was clicked and what
search term was used. Used by Google Analytics.
_omappvp
11 years
This cookie is used to determine if the visitor has visited the website before, or if it is a new visitor on the
website.
_omappvs
1 day
This cookie is used to determine if the visitor has visited the website before, or if it is a new visitor on the
website.
ab
1 year
This cookie is used by the website’s operator in context with multi-variate testing. This is a tool used to combine or change content on the website. This allows the website to find the best variation /edition of the site.
AnalyticsSyncHistory
29 days
Used in connection with data-synchronization with third-party analysis service.
omVisits
1 day
This cookie is used to identify the frequency of visits and how long the visitor is on the website. The cookie is also used to determine how many and which subpages the visitor visits on a website – this in formation can be used by the website to optimize the domain and its subpages.
omVisitsFirst
1 day
This cookie is used to count how many times a website has been visited by different visitors - this is done
by assigning the visitor an ID, so the visitor does not get registered twice.
pa
1 day
Registers the website's speed and performance. This function can be used in context with statistics and load-balan cing.
ziwsSession
1 day
Collects statistics on the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been read.
ziwsSessionId
1 day
Collects statistics on the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been read.