Merkle tree library

A Merkle tree library written in Java.

An easy-to-understand, fast, tamper-proof API for building, navigating Merkle tree structures. Support for Merkle tree proofs (of existence).

I am assuming you are already familiar with the basic concept, how a parent node’s hash is derived from hashing the concatenation of the hashes of its children. If not, take a quick look in wikipedia. Here we dive a bit more into the tree’s *structure*.

For the most part a Merkle tree looks like a balanced binary tree. Nodes at the lowest level in this tree constitute the leaf items from which the tree is deterministically derived and built. Every adjacent pair of nodes at this level join to form a parent node at the level above: and the same pairing rule applies at successive levels above, until there is but one node remaining at the highest level.

So far, so good, but what node does a last node at a level pair with if the number of nodes at that level is odd? Instead of a normative description, let’s do it by example.

A Merkle tree is structurally defined by the number of items in its leaves. Suppose our tree has 89 items. It’s useful to picture the leaf count in binary, here `1011001`

.

The tree above is in what I’m calling the *pre-build* stage. Excluding the root node, there are 3 unpaired nodes (marked blue)
in the pre-built tree. Note the one-to-one correspondence between `1`

bits in leaf count’s binary representation and the unpaired nodes.

The tree is completed by successively joining unpaired nodes at different levels and forming a parent node at one level above its left child. The process begins at the bottom and works its way to the top of the tree. A parent node constructed at this stage I’m calling a *carry* (marked amber). The built tree in our example looks like this.

Note that carries (plural for carry) can themselves be unpaired (at their own respective levels); however as the tree is built from the bottom up, there is always at least one remaining unpaired node at a level above (since the root node at pre-build is always unpaired) with which it can pair to build a parent.

A parent node’s *data* in a Merkle tree is a hash of its children’s data. Order matters: it’s a hash of the concatenation of the left child with the right child. This implementation prepends a leaf node’s data with
a `0`

[byte] and an branch (internal) node’s data with a `1`

per the recommendation by the Certificate Transparency to thwart the “2nd pre-image attack”. This is unnecessary here, since that attack only works if the height of the tree is not known and is altogether in a different context (blockchains, double spending, etc.). Still, we employ it here a) so it’ll hopefully be compatible with how other libraries do it, and b) it helps catch implementation bugs. (See `Tree.java`

if you need to edit these pre-pended paddings out.)

There are only a few classes in this API.

- Builder - As the name suggests you build your tree with this, adding an item at a time. The item (which is a
`byte[]`

array) can be anything (even empty). In most applications, it’ll be a hash or signature of something else and will be consequently fixed-width. The hashing algorithm for the internal nodes is configurable. - Tree - Encapsulates the tree and provides random access to its parts. Provides tree navigation thru the following class..
- Node - A node in the tree. Instances support navigating to parent, siblings, and children–as well as random access.
- Proof - Encapsules a minimal object “proving” the membership of an item as a leaf in the tree.
- TreeIndex - This under the hood class exposes the structure of the tree (as discussed in the section above). You might find it useful.

With the exception of `Builder`

all classes in this API are immutable and safe under concurrent access. (`Builder`

too is thread safe, but unlike the other classes, it blocks.)

There’s a good amount javadoc comment in the source. (Useful in IDEs like Eclipse.)

The repo for this project is located here.

**Total Number of Nodes**

*Propositon.* Let **T(** *n* **)** be the number of nodes in a Merkle tree with *n* (≥ 2) leaves.
Then

T(n)= 2n- 1

*Sketch of Proof.*

- Show that
**T(***n*+ 1**)**-**T(***n***)**> 0 by inspecting their respective carries. Aside: there are homomorphisms (factor groups of*n*) involving equivalent arrangements of`1`

bits that map to the same total number of carries. **T(***n***)**is odd because every parent node has an even number of children.- Suppose the number of leaves is a power of two, i.e.
*n*= 2^{k}. Then the statement is true for**T(***n***)**=**T(**2^{k}**)**= 2^{k+1}- 1 = 2*n*- 1 . - Applying the pigeon hole principle,
*n*ranging from 2^{k}to 2^{k+1}constrained by (1) and (2), conclude that**T(***n*+ 1**)**-**T(***n***)**= 2 and since**T(**2**)**= 3, the proposition must be true. ∎