A Merkle tree library written in Java.
An easy-to-understand, fast, tamper-proof API for building, navigating Merkle tree structures. Support for Merkle tree proofs (of existence).
I am assuming you are already familiar with the basic concept, how a parent node’s hash is derived from hashing the concatenation of the hashes of its children. If not, take a quick look in wikipedia. Here we dive a bit more into the tree’s structure.
For the most part a Merkle tree looks like a balanced binary tree. Nodes at the lowest level in this tree constitute the leaf items from which the tree is deterministically derived and built. Every adjacent pair of nodes at this level join to form a parent node at the level above: and the same pairing rule applies at successive levels above, until there is but one node remaining at the highest level.
So far, so good, but what node does a last node at a level pair with if the number of nodes at that level is odd? Instead of a normative description, let’s do it by example.
A Merkle tree is structurally defined by the number of items in its leaves. Suppose our tree has 89 items. It’s useful to picture the leaf count in binary, here
The tree above is in what I’m calling the pre-build stage. Excluding the root node, there are 3 unpaired nodes (marked blue)
in the pre-built tree. Note the one-to-one correspondence between
1 bits in leaf count’s binary representation and the unpaired nodes.
The tree is completed by successively joining unpaired nodes at different levels and forming a parent node at one level above its left child. The process begins at the bottom and works its way to the top of the tree. A parent node constructed at this stage I’m calling a carry (marked amber). The built tree in our example looks like this.
Note that carries (plural for carry) can themselves be unpaired (at their own respective levels); however as the tree is built from the bottom up, there is always at least one remaining unpaired node at a level above (since the root node at pre-build is always unpaired) with which it can pair to build a parent.
A parent node’s data in a Merkle tree is a hash of its children’s data. Order matters: it’s a hash of the concatenation of the left child with the right child. This implementation prepends a leaf node’s data with
0 [byte] and an branch (internal) node’s data with a
1 per the recommendation by the Certificate Transparency to thwart the “2nd pre-image attack”. This is unnecessary here, since that attack only works if the height of the tree is not known and is altogether in a different context (blockchains, double spending, etc.). Still, we employ it here a) so it’ll hopefully be compatible with how other libraries do it, and b) it helps catch implementation bugs. (See
Tree.java if you need to edit these pre-pended paddings out.)
There are only a few classes in this API.
bytearray) can be anything (even empty). In most applications, it’ll be a hash or signature of something else and will be consequently fixed-width. The hashing algorithm for the internal nodes is configurable.
With the exception of
Builder all classes in this API are immutable and safe under concurrent access. (
Builder too is thread safe, but unlike the other classes, it blocks.)
There’s a good amount javadoc comment in the source. (Useful in IDEs like Eclipse.)
The repo for this project is located here.
Total Number of Nodes
Propositon. Let T( n ) be the number of nodes in a Merkle tree with n (≥ 2) leaves. Then
T( n ) = 2n - 1
Sketch of Proof.
1bits that map to the same total number of carries.