News, background, and in-depth information about virtual data lakes: what they do, why you should use them, and their role in enterprise and solution architectures.
Virtual Data Lakes

Data Fabric and Data Mesh Will Converge

Data Fabric and Data Mesh are both hot topics in the world of data integration. There are lively debates on what the differences are, and which is better. I believe that they are founded on a common data platform technology, and will converge.

Although their principles are different, data fabric and data mesh are very similar from a technical perspective. Both follow the model of applications connecting to data sources through a mesh or fabric of data platforms.

While they have a common technical basis, data fabric and data mesh deliver different added value. Data Fabric adds the dimension of automated analysis, with the data platforms building knowledge graphs within the common metadata. Data mesh adds the human dimension, with guidance on how people should produce, use and manage the data in an enterprise.

The power of analytics and AI is becoming increasingly evident. Enterprises now have to use these technologies to be competitive. At the same time, a successful enterprise cannot ignore the human dimension.

Enterprises will deploy data platforms, use automated analysis, and organise how their people use data. The result may be described as a data fabric or as a data mesh - but it will be both.

This is the future of data integration. (More)

Virtual data lakes enable applications to mix and match data from different sources, applying distributed access control to ensure the right people have the right data. They provide a form of data virtualization, and are key building blocks for data-centered architecture.

Data virtualization is any approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted at source, or where it is physically located, and can provide a single customer view (or single view of any other entity) of the overall data. A virtual data lake is a data virtualization server whose essential component is a triple store. It need not be a large, enterprise-wide resource; a large number of small virtual data lakes may together serve an enterprise.

Data-centered architecture is an architecture style in which the data is designed first and applications are then designed to create and use it. In a data-centered architecture, programs access data at source, rather than exchanging complex information-rich messages. This reduces dependencies between programs and avoids the proliferation of modified versions of the data. The result is systems that are simpler, more robust, and less prone to error. Data-centered architectures are made possible by, and are a natural development of, universal Internet connectivity and the World-Wide Web.

No Data Products Without Representation

Here's one of my conclusions from the February 2022 Starburst  and Thoughtworks State of Data Mesh virtual events, with a historical parallel.

Data products are a great idea for managing enterprise data but, while they may be free to their consumers, they aren't easy. Imposing them without consultation will lead to trouble. (More)

Governance and Data Mesh

Lack of data governance and stewardship is a major concern for enterprise and solution architects. Data mesh is an emerging data architecture that includes a radically different approach to data governance.

Will it resolve the concerns?

This report from the 2022 Datanova virtual event addresses that quesion.  (Read the report)

The Data-Centric Manifesto starts from the premise that the Information Architecture of large organizations is a mess. Until we recognize and take action on the core problem, the situation will continue to deteriorate. The root cause is the prevailing application-centric mindset that gives applications priority over data. The remedy is to flip this on its head. Data is the center of the universe; applications are ephemeral.

Data to Combat Climate Change

PollutionMeaningful reductions of greenhouse gas emissions will come from analysis of the sources of emissions. Data professionals will play a crucial role. (More)

Semantic Data Platforms Come of Age

2021 was the second annual Data-Centric Architecture Forum. The event was held virtually, with a full three-day agenda. Produced by Semantic Arts, it featured the data-centric architectural approach described in. Semantic Arts president Dave McComb's book The Data Centric Revolution. (More)

The Knowledge Perspective

This article looks at why knowledge processing is important, how traditional and agile methods of enterprise architecture have evolved in The Open Group, and how they can evolve further to architect knowledge-based systems. (More)

Architecting for Achievement

The need for business agility is prompting a move to outcome-driven architecture.

A capability is what you can do. An outcome is what you have done. So how do you architect for achievement? (More)


The Digital Practitioner

Digital transformation is not just changing business processes, culture, and customer experiences, it is also changing professional skillsets.

The Open Group has defined a skillset for an emerging profession, the digital practitioner. (More)

Zero-Trust Architecture

First described by John Kindervag in 2010, zero trust is emerging as the best approach to IT security today. The Open Group featured Zero Trust Architecture on the first day of its July 2020 virtual event. (More)

Design for Data!

The book Designed for Digital by Ross, Beath and Mocker, published in September 2019, was eagerly awaited by the Enterprise Architecture community. It gives us a new way of looking at the business architecture of digital enterprises, but falls short when it comes to the supporting technology.

(Read the full review)