On Compressing Trees with Prüfer Codes

by Nimrod Priell

Imagine you’re a lumberjack, and you want to ship a whole forest to some far away place. You’d want to make sure you pack it as tight as you can. No, wait. Imagine you’re a software engineer and you want to transfer a huge forest (just a bunch of trees, or rooted undirected acyclic graphs, that is) across the Internet. How would you do it efficiently?

Let’s make it clear that we’re looking for a way to transfer the tree structure. That is separate from transferring the contents of the nodes. If the trees you’re representing are, for example, social network influences (e.g tweets emitted from a certain person and retweeted by his followers, and theirs, and so on), you can keep the names of the persons involved, the tweet and the time it was retweeted at in a separate message. Then you map each node to a number from 1 to the size of your tree (say, N), and our question is how to transfer these numbers in a way that the other side can understand that node number 3 is a child of node number 11, and 14 is a child of 3, for example.

So let’s start with one tree. Some trivial ideas come to mind, like writing down a node, then the number of children it has, then the identifiers for the child nodes, their number of children, and so forth. You must make a decision to rebuild the tree either depth-first or breadth-first. That’s a fine representation. But it turns out to be a little wasteful. You’re transferring every node (say, out of N nodes), and then you’re also writing down the number of children that each node has. So that’s 2N pieces of information, each of which can take N different values: The number of nodes is comparable to the number of children in the worst case (a tree with 1 level, all nodes being leaves whose parent is the root), so you can’t go much lower than that. You can come up with various tricks to optimize this trivial representation, but we should ask another question instead.

Counting trees

How much can we compress a general tree structure? The question is really, “how many trees are there?”. If we know how many trees there are, and we find a representation whose size is comparable to the number of trees, then we know we have the minimal representation: We can’t go smaller than that, because if there are less words (meaning possible instances of our representation) than there are trees, there will be no way of telling apart two trees that are represented with the same word. If however we end up sending more data than there are trees, then we’re not optimal. What we’re looking for, then, is a bijection between trees and possible words in our representation.

If you’re familiar with Graph Theory, you know that Cayley’s formula for trees says there are NN-2 possible trees with N nodes. If we can have a language with NN-2 words, each representing a different tree, we’ve gotten as lean as we can get. The 2N pieces of information from before have N possible values for each piece (since both the node identifiers themselves and the number of children can be any number from 1 to N*). They represent a space of N2N options, then – much, much larger than the space of trees. Can we find a code that represents the tree using N-2 words with N options each? Think about it – that means using even less than the identifiers of all nodes** – since there are N nodes, but we transfer just N-2 of them, in some order, and get our tree back.

This bijection exists (in fact, several useful variations do), and it’s called Prüfer codes. In a Prüfer code, we assume we both know what N is (or I can just send it as my first word). I then give you N-2 numbers from 1 to N and they tell you how to build the tree. How do I do it? I take a tree, say, (a) from the figure below.

Trees on 5 nodes

Now I take the largest leaf, write down its parent, and remove it. The largest leaf is 4 (5 is not a leaf. 2 and 3 are leaves smaller than 4). I write down its parent’s identifier, “1”, and remove 4 and the edge connecting it to 1 from the tree. The next largest leaf is 3. (Only 2 and 3 are leaves now). I write down its parent, “5”, and remove 3 and the node connecting it to 5. Now only 2 is a leaf, whose parent is 1. Writing “1” down again, we have “1,5,1” so far. The tree is down to just 5 and 1. That’s it. The tree is a connected graph, so the must be an edge between them, and we’re done: Our code for this tree is “1,5,1”. The number of rooted trees, those with a distinguishable root (like I drew them above) is NN-1 – since for every tree I can arbitrarily decide which of the nodes is the root and it’ll make it into a rooted tree: So there are N options for each of the NN-2 trees, for a grand total of NN-1. To transfer a rooted tree, I could’ve then either specifically stated that 5 is my root (say, by specifying the root as the first word, something like “5:1,5,1”) or alternatively I could’ve continued one more step with the Prüfer sequence: Since I have only 2 nodes remaining, taking the parent of the largest leaf is always equivalent to specifying the root. So that’s how you get the Prüfer code for a tree: The code for trees (b) and (c) turns out to be “4,4,1,2” and “2,2,2,2” respectively – Work it to see you follow the idea.

Note that these trees are not ordered: That is, we don’t have a discernible “right” and “left” child, so not all algorithms can benefit from this representation.

The other side, receiving this transmission, now has to recover the tree from the sequence of numbers. I’ll let you work this one out – I actually think it’s a hard but decent interview question to ask someone after having explained the encoding side (maybe after you checked he can FizzBuzz). It can be done in O(n log(n)) time, and more efficient variations of the code allow decoding in O(n sqrt(log n)) time [1].


What’s huge about it is that it’s a bijection. That means it’s not only useful for encoding efficiently, but that (for example) it’s a great way to generate trees uniformly at random. Want a random 1,000,000 node tree to run your algorithm on? Great! Generate 999,998 numbers from 1 to 1,000,000 at random, and run them through Prüfer sequence decoding to get the tree.

Also note that in many cases, it’s still okay if your graph is cyclic: You can create a tree view of that graph by picking a node and doing BFS or DFS from that node.

What’s also surprisingly nice about Prüfer sequences is that they lend themselves very naturally to rooted forests, as well. A trivial extension is specifying you have R trees and there are N total nodes in the network. It can be proved that there are RNN-R-1 possible trees in the forest (in fact, the proof involves just demonstrating the bijection with Prüfer codes), and indeed following the exact same algorithm, keeping in mind to not delete the root nodes from the graph, and adding a symbol for “end of transmission” (or sending the number of words in the message at its beginning), we can transfer the whole forest. In fact, it becomes a very efficient way to transfer sparse forests: A forest with 1,000,000 trees that have no children is just transferred as “N=1000000, End” – Because all of the nodes are root nodes. If each tree has only 1 node, you end up transferring only the root nodes, 500,000 in number (assuming N=1,000,000).

For trees with a large branching factor, the sequence is sometimes also compressible with a general compression algorithm, since you are likely to repeat every parent node multiple times.

(*) Technically the number of children can only go up to N-1. But that hardly saves us anything
(**) I’m cheating again. Because transferring the names of all nodes is a permutation on N, it means transferring N! of data, whereas transferring NN-2 of data allows us to repeat the same term twice; And it is generally larger because ln(n!) = n*ln(n) – n + o(ln(n)) whereas ln(n(n-2)) = n*ln(n) – 2*ln(n).