NLP Technology

Recent years have seen major advantages in Natural Language Processing (NLP) technology. Because of them, the Web can interact with people, and take account of their individual hopes, wants, and fears.

Is this all "Big Tech"? Companies such as Facebook, Google and Amazon are certainly leading NLP developments, but it's becoming more widely available too. Smaller enterprises can use NLP through specialist commercial partners, and there is a base of open source NLP software available to everyone.

Here's an overview of some of the key technologies. They are used in the Ideas Browser.

Language Models

Natural Language Processing uses language models. A language model captures the probabilities of words or phrases occurring in particular contexts. A relatively straightforward application of this is the type-ahead feature found on many cellphones, where the most probable next words in the context of what has been typed so far are presented. In more advanced applications, models can be used to do many things, such as analysing grammatical structure, identifying named objects, summarizing input, and explaining concepts. (This description is very much simplified. For more detail see the Wikipedia article.)

Language models can be large and complex, and often incorporate neural networks. They are generally developed by a training process in which the calculation of their probability values is adjusted in the light of their responses to inputs. This may be a manual process (supervised training) or automated (unsupervised). The probable occurrences of words or phrases in a large body of text can be determined automatically, but manual training may be required to produce particular results, for example for a sales chatbot to recommend one company's products rather than another.

Language models trained on very large bodies of text, such as OpenAI's GPT-3 or Google's T5, can produce excellent general results, and can be fine-tuned with manual training for particular tasks.

The HuggingFace website is an excellent place to explore language models.

Word Vectors

The calculation weights associated with words in a language model neural network can be used to represent their meanings, because words that occur in similar contexts have similar weights. A list of numbers representing a word is called a Word Vector. It will typically contain several hundred numbers, like the one shown here, which represents one word. The similarity of two vectors can be determined mathematically. Words with similar meanings have vectors that are similar to each other.

-0.014424 -0.0060105 -0.23573 0.055827 0.16621 0.08586 0.056778 -0.12082 -0.2068 1.8227 -0.18968 0.096278 0.3115 -0.3511 0.16807 0.43519 -0.2847 0.86006 -0.083357 -0.2611 0.099658 0.034132 -0.082531 0.32734 0.24354 -0.11849 -0.087074 0.3135 0.11809 -0.16978 0.10025 0.3418 0.24852 0.28507 0.18219 0.21475 0.20191 -0.17904 -0.94466 0.094922 0.082908 0.10934 0.16067 -0.56888 0.34838 -0.13593 0.1084 0.1863 0.28772 0.26694 -0.47447 0.62358 -0.065017 0.17541 0.27851 -0.13052 -0.020737 0.38333 -0.38793 -0.23795 0.078389 0.43421 -0.038339 0.36992 0.075892 -0.12072 -0.35845 -0.087293 0.57002 0.51437 -0.10224 -0.24862 0.2397 -0.79136 -0.27372 -0.17156 0.32221 0.021043 -0.15874 0.38793 0.13417 -0.14157 0.016463 -0.035537 0.27587 -0.38601 -0.1694 -0.77203 -0.020286 -0.26517 0.020528 -0.39141 -0.39731 0.14371 0.045584 0.1605 -0.15453 -0.30671 0.47783 -0.46223 -0.26121 -0.59888 -0.22153 0.039089 0.44889 -1.0306 -0.023177 0.23335 -0.055428 0.19867 -0.31109 0.20258 0.081716 0.050923 -0.3448 0.093402 -0.29908 0.27934 -0.23916 0.13909 0.040115 -0.38913 -0.3187 0.47835 0.068873 0.16903 0.16713 -0.12294 -0.32976 0.14313 0.36823 -0.14122 0.037326 0.16381 -0.17069 -0.44479 0.28705 0.077059 -0.29525 -0.15743 -2.0507 0.32185 0.081956 0.12461 0.1758 -0.33869 0.2302 0.246 -0.16475 -0.1834 0.27745 -0.28752 -0.084271 0.14035 0.28461 -0.09891 -0.27006 -0.60533 0.1031 -0.076984 -0.046912 -0.38553 -0.042492 -0.12971 -0.33791 -0.18947 -0.1507 -0.15085 -0.53592 0.21994 -0.27333 0.12532 0.086636 -0.68528 -0.13592 0.26858 -0.020204 -0.19462 0.081713 0.05043 -0.083394 -0.42698 -0.077398 -0.15445 -0.14221 0.29579 -0.48817 0.10022 0.32991 -0.93171 -0.42618 0.053533 -0.13808 0.33215 0.23071 0.34492 -0.046429 -0.45281 0.14532 0.59281 0.14895 0.090804 -0.14079 0.020545 0.2919 -0.29759 0.35781 -0.030492 0.37118 0.27445 -0.22319 -0.1243 0.2875 0.10292 0.50202 -0.14646 -0.054308 -0.013103 -0.59811 -0.25099 0.26305 -0.31497 0.20552 0.14064 -0.16907 -0.13527 0.29132 0.23074 -0.20524 -0.10592 0.0076145 -0.24053 -0.034934 0.24643 -0.43165 -0.1318 0.26925 0.40882 -0.2167 0.063512 0.29812 -0.049572 0.55242 -0.13645 0.23623 -0.49168 0.17476 -0.55628 0.274 -0.5308 0.068254 0.23861 -0.3133 0.099796 0.1225 0.80555 0.36096 0.40697 0.15055 0.31059 0.14038 -0.00040845 -0.25184 0.31004 0.30279 0.084155 0.049687 0.43186 0.98197 -0.22268 -0.44432 -0.17561 -0.28118 -0.080212 -0.1105 -0.53678 -0.10331 0.019815 0.14769 0.017286 0.13002 -0.0029658 0.5245 -0.59704 -0.35833 -0.26183 -0.31901 0.30774 0.042265 0.044312 0.069384 -0.034785 0.033889 -0.03248 -0.039494 0.2066 -0.28588 0.17215 -0.17326 -0.24794

Example of Vector Similarity

The power of word vectors is often illustrated by the assertion that, if you take the vector for "king", subtract the vector for "man", and add the vector for "woman", the result is similar to the vector for "queen". The table shows similarities for the king/queen example, and for some other words.

		Similarity
king - man + woman	queen	79%
king	queen	73%
knight - man + woman	bishop	33%
black	white	86%
black	red	74%
black	semantics	-4%

The result of taking the vector for "king", subtracting the vector for "man", and adding the vector for "woman", is indeed similar to the vector for "queen". It is more similar to "queen" than "king" is.
Performing the same operation on "knight" does not produce a result that is very similar to "bishop" (although there is some similarity).
There is strong similarity between "black" and "white". Although they are opposites, they occur in similar contexts. There is considerable, but less, similarity between "black" and "red".
There is no similarity between "black" and "semantics", which have no shared meaning or context.

There is no standard mapping of words to vectors. Different language models produce different sets of vectors. The example uses the spaCy en_core_web_lg pipeline, which has 685K vectors, each containing 300 numbers.

The operation of building a model containing a set of vectors is very computationally intensive. The resulting model may be gigabytes in size but, once it is downloaded, NLP operations that use it can be performed reasonably quickly on a moderately-powered computer.

Sentence Embeddings

Word vectors are also often called word embeddings. As well as being used to represent words, vectors can be used to represent phrases, sentences, or other sets of words, and are then more usually referred to as embeddings. A sentence embedding is a vector of numbers that represents the meaning of a sentence. As with word vectors, the distance between their embeddings gives an indication of the similarity between sentences. The table illustrates this for some examples.

		Similarity
black is white	black is not white	85%
The tank and the dive-bomber were essential to blitzkrieg	My cold water tank froze last winter	26%
the cat sat on the mat	the cat sat on the carpet	77%
the cat sat on the mat	the cat dug its claws into my hand	51%
What time did your friend go home?	The cat dug its claws into my hand.	5%
Will you join our club?	The clock has stopped.	-9%
I think you're right. I didn't think about that. I should go talk to them about it. Where is the closest Wal-Mart?	Come with me.	-1%
I'd like one more blanket.	Tom worked like a madman.	-1%

Sentences are similar if they have similar content, regardless of meaning. "Black is white" and "Black is not white" have similar content but opposite meaning.
A word may have several possible meanings. Content assessment may not reflect this. Use of the word "tank" with different meanings did produce some similarity.
Similarity scores do correspond intuitively to probable similarity of context.
(The last four pairs are some Random Common Sentences from Englishinuse.net.)

Transformers

Transformers are language models that are packaged so that they take a piece of text as input and, in response, generate another piece of text as output. They can often also export their calculation weights as word vectors or sentence embeddings. They can be used for tasks including summary, explanation, and translation. They are easy to use because the input can specify what kind of output is required. For example, a transformer could be asked to summarize the text of the US Declaration of Independence or to translate it into French, and would produce a summary or a French translation accordingly.

The sentence comparison above uses embeddings from sentence-transformers/paraphrase-distilroberta-base-v1. This is a sentence-transformers model that maps sentences and paragraphs to a 768 dimensional dense vector space. GPT-3 and T5 are also transformer models.

Because they consider the context of a word or phrase, and because they are trained on very large amounts of data, the latest transformers typically produce much better results than you would expect from the similarities shown in the examples on this page.

The ability to generate, from a given piece of text, another piece of text that summarises it, expands it, translates it, or answers its questions is very powerful, but transformers do have limitations. Currently, the amount of input text that even the best of them can handle at one time is less than 2,000 words. This means that they cannot, for example, identify key topics in large bodies of text.

Word vectors and sentence embeddings can be used for operations such as search and topic analysis on large bodies of text. Transformers give quality results on small amounts of text, while word vectors and sentence embeddings enable natural language processing at scale.