I was recently talked into taking on the role of the negligent MediaWiki administrator for the esoteric programming languages wiki. As a consequence, I have now a local copy of the XML dump of the wiki contents (that anyone can download). To celebrate Easter in the traditional manner, then, I've spent some time computing statistics of the link graph of the wiki.

### Background

For the following statistics, vertices of the graph are all articles in the main, Category: and Esolang: namespaces, while links from `A` to `B` are the (directed) edges.
Multiple links with the same target have been generally considered multiple edges; i.e., the link graph is a *multidigraph*.

The XML dump used was dated `2014-04-08T10:07:14Z`.
There are some known imperfections in the XML dump parsing: redirect pages have been considered separate vertices in their own right (with a single link to the redirect target), and only links in the raw wikitext source have been counted.
(E.g., links hidden inside transcluded templates as well as outgoing links from the automatically generated category pages have been ignored.)
In cases where it makes more sense, analysis has been restricted to the "main" strongly connected subgraph.

### General Statistics

There were 1781 vertices and 10815 edges, or 10143 edges when collapsing multiple edges with the same head and tail.

The maximum outdegree was 858 ("Language list"), and the maximum indegree was 759 ("Category:Languages").
For full lists sorted by the out- and indegree, see `outdegree.txt` and `indegree.txt`.
Maximum edge multiplicty was 17, achieved by the 17 links from Brainbool to Brainfuck.

The graph diameter (largest value of the length of the shortest directed path between any pair of vertices) was 14.
This distance was achieved by the 24 pairs of vertices listed in `diameter.txt`.

The largest weakly connected component had 1692 vertices; the remaining 89 vertices were divided in 1 component of size 3, 5 components of size 2 and 76 components of size 1. The components of size 2 and 3 are mostly redirect pages.

The largest strongly connected component had 1303 vertices; the remaining 478 vertices were divided in 4 components of size 2 and 470 components of size 1. The four strongly connected components of size 2 are indicated in the table below.

# | Pages |
---|---|

1 | SELECT. |

SELECT./Hello World | |

2 | Preposterous Programming |

Preposterous Programming Language | |

3 | Postfix notation |

Prefix notation | |

4 | Esolang:Community portal |

The Esoteric File Archive |

### Closeness Centrality

Looking only at the primary strongly connected component, it takes an average of 3.6990 clicks to get from one page to another. Some pages, however, are more central than others. The following table shows the top and bottom 10 pages ordered by the average number of clicks it takes to get to an arbitrary article, when starting from the mentioned page. (Many of the bottom ten are, quite naturally, redirect pages.)

This analysis was inspired by Six Degrees of Wikipedia.

Rank | Clicks | Page |
---|---|---|

1 | 1.6224 | Language list |

2 | 2.2955 | Main Page |

3 | 2.3147 | List of ideas |

4 | 2.3691 | Joke language list |

5 | 2.4198 | Esoteric programming language |

6 | 2.4428 | Esolang:Featured languages/Candidates |

7 | 2.6201 | Category:Languages |

8 | 2.7183 | Hello world program in esoteric languages |

9 | 2.9747 | Brainfuck |

10 | 3.0276 | Popular problem |

⋮ | ||

1294 | 5.3860 | Pedro Gimeno |

1295 | 5.4505 | Funciton/Functions |

1296 | 5.4574 | Migol |

1297 | 5.4597 | Migol 11/IO Functions |

1298 | 5.4820 | Recursor |

1299 | 5.6876 | ByteByte |

1300 | 6.0169 | Amelia |

1301 | 6.0783 | Church integer |

1302 | 6.2072 | BytePushCore |

1303 | 6.2072 | PUSHEM |

The full list of articles ranked by closeness is at `center.txt`.

### Markov Chain Distribution

Looking again at the strongly connected component, we can build a Markov chain model, with transition probabilities
\[ p(x_t \mid x_{t-1}) = \frac{\left| L(x_{t-1}, x_t) \right|}{\sum_x \left| L_(x_{t-1}, x) \right|}, \]
where \(\left| L(x, y) \right|\) is the number of links from \(x\) to \(y\).
The *stationary distribution* of the model (it might not exist, or there might be several; we will ignore these possibilities) corresponds to a (normalized) left eigenvector of the transition matrix \(\left[\mathbf{P}\right]_{ij} = p(x_t=j \mid x_{t-1}=i)\) with an eigenvalue of 1.
See e.g. Wikipedia for further details.

In any case, intuitively the stationary distribution denotes the relative frequency of encountering a particular article, when clicking links randomly forever.
This is quite close to the well-known PageRank algorithm, though the latter has some further bells and whistles, such as the damping factor.
Top ten pages in this sense (highest probability in the stationary distribution) are given in the table below; full list can be found at `markov.txt`.

Rank | Freq. | Page |
---|---|---|

1 | 0.0868 | Language list |

2 | 0.0847 | Category:Languages |

3 | 0.0542 | Esoteric programming language |

4 | 0.0351 | Brainfuck |

5 | 0.0339 | Turing-complete |

6 | 0.0202 | Turing machine |

7 | 0.0195 | Category:Implemented |

8 | 0.0190 | Computational class |

9 | 0.0153 | Befunge |

10 | 0.0131 | Category:Concepts |

### Shortest Path Finder

The following form can[*] be used to find shortest path between any two articles that are part of the link graph. Do note that this is not going to stay up to date with edits in the wiki. It will (try to) load the full link graph on first click, so a small delay is normal. With luck, that will also add page title completion to the input boxes, assuming HTML5 <datalist> support.

[*] Assuming a browser happy with quickly hacked-together JavaScript, exact spelling of titles, etc.

### Force-Directed Graph Visualization

Courtesy of `d3.js`, here is also a force-directed graph layout visualization of the strongly connected component.
It can be pretty slow, and it will be rather complex and messy.
Vertices are colored based on their namespace (blue for main, orange for Category: and dark red for Esolang:), and hovering over a node should both show its name as a tooltip as well as highlight its immediate neighbors.