Virtual Data Lakes for NLP
As natural language processing technology matures, it will be used more and more for specific applications. These will deploy standard NLP products and services to serve the specific purposes of particular enterprises. Often, they will build knowledge graphs from enterprise and other data, analysed and processed by language models.
Virtual data lakes provide efficient access to text that the applications process and also to binary data, such as sentence embeddings, that the applications generate as part of the processing. Their schema-less data storage and retrieval paradigm is excellent for knowledge graphs. Their Python API enables AI professionals to use their language of choice. They are an ideal way to store work-in-progress data, and an ideal platform for NLP applications.