Tractatus Logico (Phylo)sophicus

Over the Christmas holidays, I read “Maths Meets Myths: Quantitative Approaches to Ancient Narratives," from the Springer Understanding Complex Systems collection.

The authors present their application of “hard” science techniques to datasets coming from the humanities – mostly large corpus of texts, legends and myths.

One paper in particular uses bioinformatics and phylogenetics to study the spread of a popular folk tale: Little Red Riding Hood. The story that I knew from Perrult and Grimm has patterns that are also found in African and East Asian tales.

The Tractatus Logico-Philosophicus viewed as a phylogenetic tree

Inspired by this, I’ve had a look at Wittgenstein’s Tractatus Logico Philosophicus (available on Project Gutenberg), which is presented as hierachically numbered statements and sub-statements.

We start by scraping the book into a dataframe with one row per statement:

``````library(rvest)
root <- page %>% html_node("#root")

df <- data.frame()
for (item in root %>% html_nodes('li')) {
label <- item %>% html_attr("data-name")
content <- item %>% html_text(trim = TRUE)

temp <- data.frame(label, content)
df <- rbind(df, temp)
}
``````

We then generate our cluster analysis based on the distance between the columns of `df`, hoping that the hierachical numbering of statements will yield something interesting.

We adopt the `single` method, described like so:

The single linkage method (which is closely related to the minimal spanning tree) adopts a â€˜friends of friendsâ€™ clustering strategy

``````clusters <- hclust(dist(df), method = "single")
``````

Dendograms galore

From these clusters, we can represent the book as dendograms, which are used in phylogenetics to represent evolutionary splits and genetic relationships in a tree.

``````plot(clusters, labels = clusters\$labels)
``````

``````d <- as.dendrogram(clusters)
plot(d, horiz = TRUE, type = "triangle")
``````

``````library(ape)
plot(as.phylo(clusters), type = "fan")
``````

The diagrams above show how our clusters have correctly grouped together the hierachical statements of the Tractatus.

From Mike Bostock’s Tree of Life helped by Jason Davies' work parsing a Newick text file format (standard in tree representations) in Javascript, I re-implemented the above with `d3-jetpack` and ES6: https://bl.ocks.org/basilesimon/66db4338c15099f6e8d62f236db2ef2d.