Tractatus Logico (Phylo)sophicus
Over the Christmas holidays, I read “Maths Meets Myths: Quantitative Approaches to Ancient Narratives,” from the Springer Understanding Complex Systems collection.
The authors present their application of “hard” science techniques to datasets coming from the humanities – mostly large corpus of texts, legends and myths.
One paper in particular uses bioinformatics and phylogenetics to study the spread of a popular folk tale: Little Red Riding Hood. The story that I knew from Perrult and Grimm has patterns that are also found in African and East Asian tales.
The Tractatus Logico-Philosophicus viewed as a phylogenetic tree
Inspired by this, I’ve had a look at Wittgenstein’s Tractatus Logico Philosophicus (available on Project Gutenberg), which is presented as hierachically numbered statements and sub-statements.
We start by scraping the book into a dataframe with one row per statement:
library(rvest)
page <- read_html("http://www.tractatuslogico-philosophicus.com/")
root <- page %>% html_node("#root")
df <- data.frame()
for (item in root %>% html_nodes('li')) {
label <- item %>% html_attr("data-name")
content <- item %>% html_text(trim = TRUE)
temp <- data.frame(label, content)
df <- rbind(df, temp)
}
We then generate our cluster analysis based on the distance between the columns of df
, hoping that the hierachical numbering of statements will yield something interesting.
We adopt the single
method, described like so:
The single linkage method (which is closely related to the minimal spanning tree) adopts a ‘friends of friends’ clustering strategy
clusters <- hclust(dist(df), method = "single")
Dendograms galore
From these clusters, we can represent the book as dendograms, which are used in phylogenetics to represent evolutionary splits and genetic relationships in a tree.
plot(clusters, labels = clusters$labels)
d <- as.dendrogram(clusters)
plot(d, horiz = TRUE, type = "triangle")
library(ape)
plot(as.phylo(clusters), type = "fan")
The diagrams above show how our clusters have correctly grouped together the hierachical statements of the Tractatus.
From Mike Bostock’s Tree of Life helped by Jason Davies’ work parsing a Newick text file format (standard in tree representations) in Javascript, I re-implemented the above with d3-jetpack
and ES6: https://bl.ocks.org/basilesimon/66db4338c15099f6e8d62f236db2ef2d.
The resulting chart is at the top of this page.
I love how simple the result looks and how little we end up knowing about the book itself. The only thinkg I’ll let you in the final, chapter seven put-down of this book about language, facts and truths of the world:
What we cannot speak about we must pass over in silence.
Precisely what I didn’t do in this blog about phylogenetics and a book I never finished.