Introduction
In this moment of technological advancement brought about by the release of ChatGPT three years ago, there is no denying that how we build and support software applications, and the role we humans will have in that new reality of automation, will drastically change. How it will progress towards AGI is up for debate, but the added value LLMs already give us today is unmistakably profound. One might wonder how this will impact specifically the data platforms in use and those being built in the cloud today. How it impacts the enterprise architectures that have been defined, and the processes and people organised to be data driven that you can find in pretty much every company’s vision and strategy document. This revision is crucial as it impacts not just the future but already the present, in terms of how you integrate AI into your organisation. One thing is certain: data needs to get smarter fast, and preferably today. For as long as information technology has existed and business applications have been introduced to automate processes, the dilemma of data fragmentation, and therefore a lack of a unified meaning of data, has been one of the foremost challenges companies have had to deal with. A massive industry of managing data and extracting value through new insights has evolved over the last few decades. We have now reached a pivotal moment where data cannot be stale and fragmented; it needs to be part of a bigger picture of leveraging intelligence residing in the cloud or on-premise. Data therefore needs to be organised in such a way that it can itself be empowered by intelligence, in order to get the most out of it for broader use cases.
The Challenges I See
Making data smart is not only about making it adhere to defined data quality dimensions and carefully drafted governance and compliance principles. Something we seem to have been chasing these forever, as the variables that create and define data are dynamic in nature. But sometimes it feels like a dog chasing its own tail, despite the frameworks available like DATA-DMBOK that provide a comprehensive approach to tackling the data challenge head on. Despite all these attempts to provide clarity and structure, it’s hard to see the forest for the trees. It’s complex, and once you think you have the data challenge in check, it seems to elude you again. Because of for example newly introduced business applications, or incomplete and poorly adopted data governance processes, that slowly but steadily degrade your data quality with significant downstream impact. And these tend to surface sooner rather than later in specifically these critical data driven initiatives. It’s true that managing and governing data is a continuous process, but one that every enterprise wants to handle as smartly and efficiently as possible.
Concepts like Data Mesh, widely adopted over recent years and being applied in ongoing data transformation programs within companies, address many of these concerns by bringing data closer to those who possess the knowledge to manage it effectively . This empowers business domains to control their capabilities to manage and leverage data, extracting the value they need, with their own resources and backlogs to prioritise accordingly. This is supported by a centrally managed data platform, equipped with standardised methodologies and technologies aimed at reducing the total cost of ownership. Preventing ‘reinventing the wheel’ by propagating accelerators for sourcing, processing and making data accessible, ensuring the platform remains compliant with data security policies, and optimising the technology stack to deliver optimal value at the lowest possible operating cost.yt
But like most concepts and methodologies introduced to deal with the data complexity created in our digital age, it all stands or falls on whether a company has truly embraced a data culture. Those who should own the data, namely those who own the business process that depends on it as input or that generates it, need to be incentivised to take control of it. Business and IT domains must operate as one, removing friction and creating clarity and transparency on who is responsible for what, so that nothing is lost in translation.
Data culture is a very elusive concept and therefore hard to quantify. One might be tempted to boil it all down to measuring data points such as: what you pay for your technology stack to store and process all your data; how many hours people spend managing it, and how many reports and insights are delivered, and whether these are actually trusted, used and therefore valuable. But all of this is diluted by changes in the organisation over the course of time, and by new paradigms and technological enablers as these emerge, sometimes spanning multiple leadership cycles.
But if such a data culture is not well established, the benefits that these methodologies and architectural principles could deliver become marginalised, and these hidden costs are particularly hard to capture in large and complex organisations. Change takes a long time, and the baseline against which you could measure what constitutes progress is easily lost in an ever changing organisational environment.
Some take aways
Some thoughts I take away from all of this:
-
Getting data as close as possible to those who use it and have the knowledge to maintain it is popularised by Data Mesh for all the right reasons, but challenging to put into practice. Making it part of the day to day work of subject matter experts, rather than a separate task delegated to people without the right context, avoids unintentionally creating friction and therefore a lack of clarity. The gap between specific business application knowledge and general business domain knowledge creates friction as well, and is one of the main causes of hidden costs in managing data. Data glossaries are often introduced to deal with these discrepancies, but they carry an inherent risk of introducing additional complexity into your organisation if not well adopted. This is where AI can remove the friction.
-
The boundary between Business and IT is increasingly irrelevant. Business processes are what make a company tick, but on the flip side companies are more technologically driven than they sometimes demonstrate in how they organise themselves, and the two should go hand in hand as much as possible. Synergy between them is essential to create value, and AI can bridge these two worlds, resulting in leaner processes and delivering greater overall agility.
-
What you want is data to be self-healing by leveraging the technology available to us today. The Data as a Product concept, which is itself fragmented across standards such as the specifications from the Linux Foundation, ORD used by SAP, and DPROD, can be simplified by introducing AI into the equation. Correlating separate data sets from different sources into derived, uniform data products, that are widely recognised across the organisation as the vehicle for sharing data meaning and value, can be a significant undertaking. With AI, we no longer need to manually or semi-automatically analyse and label data to have it properly transformed by data pipelines in platforms such as Databricks or Snowflake, in order to make it accessible in data catalogues. AI will not fully replace humans in these processes in the foreseeable future, but it will make creating and maintaining uniform data sets within your lakehouse significantly more effective and efficient.
-
Build a learning organisation, not only by documenting knowledge and lessons learned over time, but by making it discoverable. Not by scattering data across wikis, SharePoint and other storage solutions that are poorly structured and maintained due to changing teams and projects that have reached end of life. There is no longer any excuse not to automate documentation continuously and centralise knowledge to make it accessible, so that new initiatives can leverage it from the start. The hidden costs of missing this valuable data asset are significant, and enabling AI to manage and surface knowledge is a business case firmly in the ’low-hanging fruit’ category, yet not always prioritised as such.
Wrap Up
-
In a next blog post I will focus on how intelligence can be layered into your data foundation by leveraging AI to help you manage and enrich it. Concepts like ontology and semantics will come into play to enhance the Data as a Product concept.
-
Adopting ontology driven knowledge graphs as data products is only part of the equation, as is enhancing data contracts by focusing on defining ontologies at an abstract level within logical data models, and at attribute level within physical data models.