Data mesh: Redefining modern data architecture

The rapid scale of cloud adoption and digital transformation has spearheaded a massive change in the present technology landscape. Self-service tools, cloud-native applications, and data-driven technologies are redefining the traditional data stack. Within this landscape, the data mesh is fast emerging as a revolutionary paradigm for new-age analytics architecture.

It is a modern architecture approach based on microservices, distributed ownership, domain-based design, and more. It helps enterprises easily access and query their data without transporting it to a data lake or warehouse. The data mesh decentralizes data ownership to domain-specific teams that can manage, own, and serve data seamlessly.

Why data mesh is significant?

To understand the need for a data mesh, let’s take a deeper dive into the evolution of data architecture over the past few decades.

The first generation of data architecture was built around an enterprise data warehouse, multiple relational databases, and standalone business intelligence platforms. ETL jobs were manually executed, and BI reports were generated with insights for business stakeholders.

Eventually, enterprises transitioned into Hadoop-based data lakes that unified an organization’s relational databases under a single umbrella, enabling easier querying from large datasets and greater visibility into enterprise data.

In recent years, the need for real-time analytics has given rise to a modern data architecture paradigm based on stream processing, cloud-based data lakes, and BI tools. However, for many enterprises, architectural limitations continue to pose challenges like:

Ever-growing data sources and volumes make it difficult to scale centralized data platforms
Monolithic, domain-agnostic data platforms often have high failure rates
Coupling pipeline architecture for ingestion, cleansing, aggregation, serving, etc. is complex
Delivering consumption-ready data requires data engineers with niche expertise

Enterprises can leverage the following four key principles to address these challenges with a data mesh:

Domain-oriented ownership and architecture: Decentralizes data ownership and transfers this to domain teams most familiar with specific datasets/use cases. Each domain team manages processes like data ingestion, cleansing, and transformation, enhancing data agility and scalability.

Data as a product: Applies product thinking to datasets, encouraging developers to consider the end-users as “product customers” instead. This makes them responsible for maintaining quality across the entire lifecycle, right from product creation to maintenance.

Self-service infrastructure: Rests on an underlying common platform and set of easy-to-use, self-service tools that can be used regardless of technical skill sets. This enables domain teams to build and maintain data products independently, rather than relying on a centralized IT team.

Federated computational governance: Sets metadata and documentation standards that each domain can implement for their data products while enabling teams to combine and share independent data products securely.

Given below is a high-level overview of a data mesh-based architecture:

Fig. 1: A high-level overview of a data mesh-based architecture

Is your organization ready for the data mesh?

While the data mesh seems to be an ideal solution for all types of data platform architecture, it is not feasible for all use cases from an implementation, deployment, and management perspective. For those considering this methodology, here are some key questions to determine the path forward:

Q: Is data mesh recommended for enterprises of all sizes?

A: It is more suited for enterprises with massive-scale data management needs.

Q: Can data mesh be implemented on-premises?

A: It is more suited for a cloud setup as it requires huge infrastructure along with ubiquitous monitoring, governance, and security.

Q: Are any specialized tools or frameworks needed to implement the data mesh?

A: Since the data mesh is an architectural approach, data architects and engineers can use the company’s existing cloud services, tools, and frameworks for implementation. There is no need for any new investment.

Q: Can a data lake or data warehouse form a part of the data mesh architecture?

A: Yes, data lakes and warehouses act as nodes within the data mesh architecture. In a typical data mesh setup, core processes like ingestion, processing, and pipelining are self-service/automated, with the data lake/warehouse working in a domain-bounded context.

Q: Does data mesh work for multi-cloud and hybrid cloud setups?

Data mesh infrastructure (including the self-service platform) needs to be built on a highly available, scalable, and cost-optimized computing backbone. Therefore, it works better with a multi-cloud setup, where the dependency is not on a single cloud provider. However, in certain cases, it can also be implemented for a hybrid cloud environment, depending on enterprise business needs.

Q: Is data mesh architecture complex? Does it require niche expertise?

A: Laying the foundation for the data mesh requires specialized expertise in the initial stages, as any oversights can have a cascading effect on the complexity of maintenance and operations.

Enterprise architects need to carefully evaluate the need for a data mesh based on their existing technology architecture, use cases, and business goals. Here is a step-by-step flow to help you assess your readiness:

Fig. 2: A step-by-step flowchart to assess data mesh readiness

A data mesh approach can help enterprises move away from monolithic data architecture, break down silos, and enable analytics at scale. It may also help significantly reduce operational and storage costs. However, the data mesh is not a “one-size-fits-all” solution to address all data platform challenges. Its merits need to be carefully weighed against those of unified data architectures (like a data lakehouse powered by cloud services).

To learn more, get in touch with our cloud and data engineering experts today.