John2Vec, or embedding Dewey’s philosophy

Interesting case studies come up when combining computational techniques to humanities. What happens when using writings from one of the most prolific philosophers in history as input of an Artificial Neural Network (ANN)? Which relations and insights can it extract from philosophy corpora? In this article we will showcase today’s technologies capabilities in processing philosophical texts. First, we introduce technical aspects about processing text with ANNs; then, we show a few empirical examples on which information it is possible to extract from philosophical texts; finally, we focus on a specific philosopher who has the peculiarity of having produced tons of writings.

One of the most popular ANN for textual processing is word2vec: given a textual input, it can produce vector representations (or embeddings) of words carrying contextual information. The context of a word w is defined as the set of terms frequently found in the same sentence of w. When word2vec is given a word, it retrieves an embedding such that it is next to other vectors referring to semantically similar words in the embedding space.

These vectors are learned by word2vec through a training process requiring thousands textual documents, and this amount is more than what a single person can write in her entire life. But not if your name is John Dewey.

John Dewey (1859-1952) was a prominent philosopher, psychologist, and education reformer in America whose ideas greatly influenced school and social reforms. He was a primary figure for the philosophical tradition of pragmatism and one of the fathers of functional psychology. However, the most interesting fact for us is that, in his nearly hundred years of life, Dewey produced an incredibly vast amount of text in the forms of books, essays, articles, letters, teaching notes and so on. The full collection of his writings is made by 38 volumes grouped in: The Early Works, 1882–1898 (5 volumes), The Middle Works, 1899–1924 (15 volumes), The Later Works (17 volumes) and one Supplementary Volume. Luckily, this collection is made available at the Past Master Commons [1] database.  Thus, thanks to web scraping techniques, it was possible to build the dataset of its entire corpus.

Let’s apply word2vec to Dewey’s corpus!

Being words represented as vectors in a multi-dimensional embedding space, we can perform a few experiments to validate how vector operations relate to semantic relations. For instance, the nearest embedding to Kant is HegelEmpiricism is placed next to Rationalismnature and universe embeddings are next to each other. These examples show that Euclidean distances relate to semantic similarities.

To further demonstrate word2vec potential, we try to extract complex relations using vector operations. By adding the difference between two semantically related terms, such as idealism and Hegel, to another embedding, such as Kant, we obtain the vector associated to rationalism. Equivalently, we are comparing Hegel to Kant to find out which is Kant’s school of thoughts. This experiment demonstrates the possibility of extracting analogies through vector operations involving word2vec’s embeddings [2].

There are many other possible procedures to apply for extracting insights from word2vec’s embedding space; however, we will focus on studying semantic shift. Semantic shift is a phenomenon that concern the evolution of a word usage. Indeed, the meaning of a word is not fixed once for all and can change over generations, lifetimes or geographical regions.

We can analyse semantic shift in Dewey’s corpus thanks to documents’ time annotation. The goal here is to compute and compare three different embedding spaces corresponding to each period of Dewey’s production (Early Works, Middle Works and Later Works). The problem with comparing vectors belonging to different embedding spaces is that their positions in a space are relative to all the other embeddings in that space. Therefore, in order to detect semantic shift, it is necessary to analyse how a vector changes in relation to the other words of the embedding space.

For this reason, we select a subset of words by choosing terms that Dewey uses more frequently than other authors in the same epoch. To find words having similar semantic representation, we run a clustering algorithmon each period’s embedding space. Clustering algorithms group embeddings trying to minimize distances across vectors of the same group (called cluster), maximizing distances between vectors of different clusters. For two words to be in the same cluster means having a related meaning in a specific period. Looking at how words change clusters across periods can highlight the semantic shift of words.

Here is a Sankey diagram summarizing the overall results. The blue boxes represent clusters decorated with a label indicating the period they come from, an id and a representative word. The three columns correspond to periods. For example, the cluster labelled with “2.0 education” is a cluster from the second period, with id = 0, containing words related to education. The height of a box is proportional to the number of words it contains. The grey streams represent the number of words going from a cluster to another one over periods.

On one hand, if there were no semantic shifts between periods, this would result in always finding the same clusters. On the other hand, if the embeddings were completely independent across different periods, there would be no consistency between clusters, resulting in words going to any clusters indipendently from the cluster they come from. This plot is interesting as it shows how clusters tends to have some sort of consistency across period, but, at the same time, some phenomenon of semantic shift is also included. In the following we show some interpretations made possible by the diagram.

The “1.1 philosophy” cluster is composed mainly by phylosophers such as Russel, Pierce, Hegel, Kant, Leibniz, Aristotle and Dewey himself. It is interesting to notice that across periods these authors are progressively divided between contemporary and past authors, with respect to Dewey’s lifetime, resulting in two clusters:

  • cluster “3.1 philosophy” which contains for example: Pierce, Dewey, Russel
  • cluster “3.5 descartes” that contains: Hegel, Spencer, Locke, Plato, Aristotele, Kant, Leibniz. 

This can be interpreted as Dewey’s progression towards referencing contemporary and past authors differently. At the beginning of his career, Dewey must have referred to all these authors equally, but as the years went by, a distinction seems to have arisen between the authors with whom he actually engages in debate and those who remain points of reference for classical philosophy. Basically, all these authors are subject to a semantic shift that brings them to diverge in two different clusters.

Another interesting fact to notice is what happens to the clusters labeled with “education”. As previously mentioned, education is one of the most important themes among those addressed by Dewey. However, his main theories on education are expressed for the first time in various texts from early 1900s. This is consistent with what is reported in the Sankey diagram, as the cluster referring to education grows in size in the second period (1899–1924). From containing only words that are strictly related to education (schools, students, teachers and training), this cluster transitions towards the second period including words that have a more general philosophical interest (socially, individualism and impulses). Semantic shift detects this phenomenon: words related to education gain a greater philosophical interest and centrality in the second period.

This case study shows how semantic shift makes it possible to analyse Dewey’s philosophical interests and career evolution. Moreover, vector operations can be performed on philosophy corpora to extract more insights about philosophers and schools of thoughts. 

Computational natural language processing methods can be of great interest for social sciences such as philosophy: experts in this fields can benefit from these tools and techniques to analyse its history and evolution, automatically extracting relevant concepts and thoughts.

[1] https://www.academicrightspress.com/intelex/past-masters-commons

[2] You can try new experiments using the SUPPORT TO HERMENEUTICS notebook available at github.com/TommasoLocatelli/to-the-master-s-degree-and-beyond-/tree/main/john2vec

  • Elisabetta Rocchetti

    Elisabetta Rocchetti has been a research fellow since December 2022. She graduated in “Computer Science for New Media Communications” at the University of Milan. She completed her studies with a MSc in “Data Science and Economics”, shifting her focus on data science and data analysis. In particular, she is interested in Transformers, Explainable AI and Causality. In addition, she is currently involved in MUSA’s Spoke 5 “Sustainable Fashion, Luxury and Design”, a research project funded by PNRR.

  • Tommaso Locatelli

    Tommaso Locatelli holds a bachelor’s degree in Philosophy and a master’s degree in Data Science and Economics. He likes to have an interdisciplinary approach.

Leave a Reply

Your email address will not be published. Required fields are marked *