Crypto Primitives
Pedersen Hash Function/Commitment
The Pedersen hash functions are nothing more than fixing an elliptic curve point G
, and then
doing
hash(x) = xG
for a given x
, which is an element of a finite field. The Pedersen parameters
are just the sampling of the G
. Different values of G
give rise to a different Pedersen Hash.
Note that there's a very important difference between popular hash functions like Keccak
/SHA3
and Pedersen
: the input of SHA3
is just an array of bytes of arbitrary length, and its output a fixed byte array. On the other hand, Pedersen
's input has to be a finite field element (and therefore cannot be of arbitrary length) and its output is an elliptic curve element. Indeed, this is the entire point of it; it's meant to be used in the context of ZK-SNARKS/circuits, where computation is done on elliptic curves over finite fields.
The above poses a problem for Pedersen
however. When using it as the underlying hash function for a Merkle Tree, one needs to take a pair of leaves or nodes and hash both to produce the next node in the tree. When using something like SHA3
, what we do is hash the concatenation of the leaves/nodes, so we can use the same hash function we use to hash leaves to hash pairs of inner nodes. In the Pedersen
case we can't do this, because the length of its inputs is fixed; concatenating inputs will not work.
Because of this, Pedersen
is usually divided into categories called OneToOne
, TwoToOne
, FourToOne
, etc. OneToOne
takes one input and produces one output, so it's used for hashing leaves; TwoToOne
takes two inputs into one output and it's used for hashing pairs of leaves/nodes, and so on.
With this in mind, let's take a look at the MerkleTree
struct defined in Arkworks
:
#![allow(unused)] fn main() { pub struct MerkleTree<P: Config> { /// stores the non-leaf nodes in level order. The first element is the root node. /// The ith nodes (starting at 1st) children are at indices `2*i`, `2*i+1` non_leaf_nodes: Vec<TwoToOneDigest<P>>, /// store the hash of leaf nodes from left to right leaf_nodes: Vec<LeafDigest<P>>, /// Store the two-to-one hash parameters two_to_one_hash_param: TwoToOneParam<P>, /// Store the leaf hash parameters leaf_hash_param: LeafParam<P>, /// Stores the height of the MerkleTree height: usize, } }
Here, the leaf_nodes
are of type LeafDigest
, which are different from the non_leaf_nodes
, which are TwoToOneDigest
. These are generic types that represent the output of a OneToOne
and TwoToOne
ZK-Friendly hash function (like Pedersen
, but it could be another one), respectively.
Additionally, there are leaf_hash_param
and two_to_one_hash_param
fields, containing the parameters mentioned above for the OneToOne
and TwoToOne
hash functions, respectively.
If you're wondering, the generic type Config
just contains the types of the two hash functions used:
#![allow(unused)] fn main() { pub trait Config { type LeafHash: CRH; type TwoToOneHash: TwoToOneCRH; } }
CRH
stands for Collision Resistant Hash
.
Pedersen Commitment
So far we have only talked about the Pedersen
hash function, not the commitment. Even though one usually thinks of all hash functions as commitments themselves, cryptographers do not consider the above Pedersen
hash function as one. The reason is that a commitment must satisfy two properties:
-
It must be
binding
. The commiter must not be able to open the commitment at a different value than they originally commited to. Let's go through a concrete example with hash functions to explain what this means. Let's say Alice and Bob go through a guessing game, where Alice chooses a number between 1 and 10 and Bob has to guess it.To make sure Alice isn't cheating, she first hashes the number and gives the hash to Bob. Thanks to this, when the game ends and Alice reveals the number
x
she chose, Bob can check whether Alice is lying or not by hashing it. This relies on one assumption, namely, that the hash function will not give the same result for two different values ofx
(this is known ascollision resistance
). -
It must be
hiding
. A commitment to some data should not reveal anything about it.
The Pedersen
hash above is binding
, as it's collision resistant, but it is not hiding
. This is because the hash of the data reveals something about it, the hash itself. This might seem like a minor thing but it's actually a big deal.
Let's say we're working with a private blockchain, where a user's balance has to live on the blockchain for auditability, but needs to be private. One way to do it would be to store hashes of user balances. This way, users commit to their balance on-chain without revealing it. If we used the Pedersen
hash to do this, however, we would run into a problem. People could precompute the hashes of common values and then check for them on-chain.
For instance, we could compute the hash of 0
, obtaining hash(0)
. Then we would immediately know which users have zero balance, by just scanning for balances with the hash(0)
value.
To fix this, the Pedersen
commitment is introduced. It works in the same way as the hash, but in addition to doing xG
, we sample another public random curve point H
and then, to compute the commitment of x
, we sample a random number r
. Then, the pedersen commitment is
commit(x) = xG + rH
This is now hiding, as the random value r
makes the result different each time we commit. In the blockchain example above, the 0
balance does not always give the same pedersen commitment, as different people will sample a different value for r
. People can no longer precompute the commitment of 0
, because there's no such thing as the one commitment of 0
. The important bit here is that, in order for this to work, people should always use a different value for r
, which is sometimes referred to as a nonce
.