Merkle Trees in the Blockchain

16 November 2021

As I’ve been trying to learn about Blockchains and how they work, I came across Merkle Trees which have become one of my favorites about the Bitcoin Blockchain. When you view a block in a blockchain, like Bitcoin’s, you’ll notice a Merkle Root in the Block’s Header. Below is my explanation of a Merkle Tree, Merkle Root, Merkle Proof, and the reasoning for their existence in a blockchain.

Prerequisites:

Hashing
General Understanding of a Blockchain

What are Merkle Trees, and why can’t I plant them?

Simply put, a Merkle Tree is a type of Binary Tree Data structure, but where each node of the Tree is a Hash. So let’s help explain this by tying it to the Bitcoin Blockchain.

Here is an array of Transactions in a Block. These transaction hashes are the leaves of the Merkle Tree. Our Transaction is highlighted in blue; we’ll use this Transaction in a later example. Let’s walk through how the Merkle Tree works

The algorithm works by hashing pairs of hashes in an array.

We can see here that when hashing together, 32kl3h43 with n7c81293, we get 70nc1298. Because we’re hashing, our resulting value is deterministic. 70nc1298 is then appended to our Tree, and we repeat this step of hashing two other hashes, with the next pair in the array, v123m079, and 9n709x12 to get n987n1c2.

Then once the first layer is done, we move up to the next layer and continue hashing pairs until we produce a single final hash at the top of our Tree.

That final Hash is called the Merkle Root. This is the Hash placed in the Block’s Header.

Pretty simple so far, right? Now what is a Merkle Proof?

Well, let’s say we wanted to prove that our blue Transaction, 32kl3h43, is in this Block, but we don’t want to download all the transactions in the Block. We obtain a Merkle Proof ( I’ll explain where we get this soon ), which is just a collection of Hashes that, when hashing into a Merkle Tree, will produce the exact Merkle Root that’s in the Block’s Header. Okay, let’s look at the diagram.

The Merkle Proof that we get will be [ n7c81293 , n987n1c2 ]. We then run the same algorithm to create a Merkle Tree, hashing pairs of hashes. And then, we will obtain the same Merkle Root that was in the Block’s Header. Because of this Merkle Proof, we feel confident that our Transaction was in this Block. But more importantly, we did not have to know about all of the transactions in the Block to feel so assured about the validity of our Transaction.

Excellent explanation Patrick, but why have it in a blockchain?

The simple answer is to reduce the size of the blockchain. How does that work?

Blockchains can get very large. The Bitcoin Blockchain is over 350GB as of this year. And because of this, you can imagine a mobile device or another less performant device not being able to participate in the blockchain due to memory limitations.

To help lower the size of a chain, the Merkle Root of the transactions is added to every Block’s Header. A Simple Payment Verification(SPV), or a Light/Thin Node, only needs to download the Block Headers. This way, an SPV can verify that a transaction is legitimate by getting a Merkle Proof from a Full Node in the network. And then, once our SPV validates the Block’s Merkle Root with the Merkle Proof, our SPV can sleep tight. Since hashing is deterministic, any dishonest Node attempting to modify a transaction would be snuffed out and not accepted.

Merkle Trees are really fun; they help enable the global truth essential for an open blockchain. And they’re not just limited to Blockchains; for example, Git uses it to be aware of file changes. I hope you found this data structure as fun as I have, and if you’re interested in learning about this area, here are some links:

The Bitcoin Paper, section 8: https://bitcoin.org/bitcoin.pdf

Patricia Trees: https://eth.wiki/fundamentals/patricia-tree

Verkle Trees: https://vitalik.ca/general/2021/06/18/verkle.html