Towards an Introspective Data Virtualization System
Reading Time: 4 minutes

To know and knowing to know, that is, one more word for a profoundly different meaning, which introduces that awareness of knowledge, which is a key element for its sharing and, furthermore, the foundation to derive from it that value we’re looking for.

This conscious competence, surpassed only by the unconscious, when what I know becomes innate, automatic, so as to make me forget that I know it, is an essential quality for any company that really wants to qualify itself as a data-driven one. A company, that is, where the data that governs the operations, those data igniting the well-known chain, which starts from data, pass through the information, reaches the knowledge and, finally, the wisdom, understood here as pragmatism, as the company’s ability to conjugate knowledge and context – or real world – acting at its best in it.

So, it is not enough to know, but it is equally important to be aware of it, to be able to know, at any moment, what is in the informative capital of the company, what concepts are in, and what relationships among them.

This access to knowledge, this sort of corporate introspection, far exceeds, in importance, what simply allows access to data as the realization of the observed phenomena, this access, instead, carry out the difficult task of making accessible, not only the way in which these phenomena are modeled, but the very essence of the phenomena that the company has decided to observe. We are therefore talking about the ontological status of the company, of what it is and wants to be, which must necessarily be knowable and shareable, given that the company, which we often identify as a single subject, is actually a set of individualities – ideally a holism – where individuals should be motivated by a common goal, which cannot be if it is not based on what the company knows.

We must therefore to know and do it in a conscious, transmissible way and, since each company manages its data, and what can be derived from them, through IT systems, then this need must be reflected on them, requiring, in addition to the functionalities for which they are designed, also to guarantee that capacity for inspection, that ability to look inside them, which, moving to the boundaries that separate technology and philosophy, leads us to the class of these systems that we could call self-epistemic, those systems that is, able to reveal what they know through their direct observation, which implies the need for them to speak an accessible and understandable language that, better if with no or minimum training, can be read and understood by the human being1.

This requirement, for a modern data virtualization solution, is then translated into the need to combine effective, efficient and performing data management, with the ability to make accessible and understandable what these data are, where they are, how they are structured, how they are composed, giving rise to more detailed information constructs and, finally, to what relationships exist among them.

In other words, a modern data virtualization system must be open to introspection, must manifest what it knows and must do so that this knowledge is immediately understandable by whom such a system uses, without any need to translate the necessary internal formalism (or programming language), often obscure, and what its users are able to understand.

This openness, obviously, does not necessarily have to be total, absolute, but must be guaranteed for the essence of a system of this type, essence that resides where the data are modeled, the place where their value is realized and expressed.

Often, in the context of data virtualization, we refer to this place with the term data catalog, that indicates the place where this knowledge is enclosed, even if, often, this is done as if it is an optional feature, secondary in importance when compared to others, forgetting that the full awareness of which data are managed and modeled, not only is not a secondary matter, but probably outweighs most of the other features that are often considered as the first element of evaluation.

Knowing to know what data are managed by a data virtualization solution (or knowing to can know, when needed) is, or at least should be, one of the most important elements of the viaticum of those who use and analyze such data, since only so, in fact, their correct use can be guaranteed, but above all, can be guaranteed an essential, efficient and consistent modeling, remembering that a redundant modeling, not aware of what has already been done, not only is it a cause of useless work, but, far more serious, it can lead to an ambiguous, contradictory ontology, where people who should share the meaning of a concept, duplicate it only because they are not aware – or they cannot be – that someone has already modeled that concepts2.

The data catalog, then, becomes one of the key elements of a data virtualization solution. It becomes something that cannot be renounced, just like other functionalities for which this indispensability is well established. No longer an internal component, inscrutable, a mere technical element as can be the component of an engine of a car but, on the contrary, something accessible and investigatable; an entry point for the whole solution, where the intentionality is achieved, which is nothing other than the reason why such a solution is implemented, since modeling a data does not only mean defining its structure but, above all, giving evidence of how, by the way in which it is defined, this will be used within the company or any of its business departments, taking a small step towards what Wittgenstein said in his “Philosophical Research“, when he tells us that “meaning is use“.

In conclusion, a modern solution of data virtualization cannot disregard an equally modern data catalog, which allows its users to be truly autonomous, to know firsthand what the system knows, without the need for any intermediation, that inevitably creates slowdown.

With a minimum of interpretative freedom, we could then say that a data catalog realizes the spirit of Maria Montessori School method, based on the maxim “help me to do it myself“.

 


1This is not the case for the so-called sub-symbolic systems, such as neural networks, where knowledge is encoded in a non-explorable way, which also reflects on the trust we instinctively place in them, a trust undermined precisely by the impossibility of being able to read within them and that leads us to make a sort of vow of faith towards them.

2If, as mentioned above, a company is a group of individualities, it is also true that these can be grouped into units that share common objectives – think, for example, Sales and Marketing – for which it is essential to have a unique and unambiguous representation of the context in which they operate and of the meaning of the individual concepts that in this context exist.

Andrea Zinno