- zero-knowledge-proofs key-derivation commitments verifiability verifiable-random-function
- Updated Sun, 18 Sep 2022 13:48:42 GMT

End-to-end encrypted web services (like cryptpad) often include a 128-bit seed in the hash part of the URL (that is not sent to the server), to derive both an identifier that is sent to the server, and the encryption key that is used on client-side only. This seed allows shorter URLs than storing both the identifier and the encryption key directly.

However, while non-cryptographic identifiers are usually generated by the server, the seed needs to be generated exclusively on the client-side, so the encryption key remains secret. Therefore, modified clients can choose arbitrary seeds, like a readable name when encoded in the URL. Is there a cryptographic scheme to prevent arbitrary seeds? Something like this:

- The server sends entropy E to the client.
- The client generates the seed S based on E and extra entropy E'. The server cannot infer S. S should be short (~ 128 bit). S is based on E only to ensure the client cannot choose S freely.
- The client derives the identifier I (128 bit) and the encryption key K (128 bit) from S deterministically.
- The client reveals I to the server and proves that I is indirectly based on E without revealing E', S, or K. Then, the server can allow creating a resource with identifier I.

For the proof, multiple round trips between server and client would be acceptable.

Using Ed25519:

The group size $\ell$ generated by the base point $G$ on the curve is approximately $2^{252}$. As an abbreviation, references below to "252-bit" numbers refer to scalars less than $\ell$.

The client has 128-bit entropy $e$, and picks a uniformly random 252-bit scalar $b$.

The client uses $b$ as a blinding factor to derive the Pedersen commitment $C=eG+bH$. $H$ is a second well-known base point on the curve, picked using a 'nothing-up-my-sleeve' technique such that the discrete log of $H$ w.r.t. $G$ is unknowable.

The client sends $C$ to the server, and the server picks random 128-bit entropy $e'$. The server sends $e'$ to the client.

The client calculates the 128-bit seed $s=(e+e')\ mod\ 2^{128}$. As long as either $e$ or $e'$ is uniformly randomly distributed, $s$ will be uniformly randomly distributed.

The client calculates the commitment: $C'=C+e'G$. Depending on the values of $e$ and $e'$, and thus depending on whether the $mod\ 2^{128}$ operation caused $2^{128}$ to be subtracted from $e+e'$, either $C'$ or $C'-2^{128}G$ will be a commitment to the value $s$.

The server also calculates the same value $C'$ for itself.

If $2^{128}$ was not subtracted during the $mod\ 2^{128}$ operation, this will mean that $C'=C+e'G==eG+bH+e'G==(e+e')G+bH==sG+bH$.

Otherwise, $C'-2^{128}G$ will be a commitment to the value $s$.

At this point, the server could allow both $H_{128}(C')$ and $H_{128}(C'-2^{128}G)$ as the identifier, where $H_{128}(\texttt{input})$ means to hash the input (using a cryptographically secure hash such as SHA512) and return the first 128 bits of the result.

However, that would require the client to remember not only $s$, but also the blinding factor $b$.

To solve this problem, the client now calculates the commitment $C''=sG+H_{252}(s)H$.

$H_{252}(\texttt{input})$ means to hash the input using a cryptographically secure hash and return a 252-bit scalar as the output.

The client creates a proof that $C''$ is a commitment to the same value $s$ as either $C'$ or $C'-2^{128}G$. This proof will allow the server to accept $H_{128}(C'')$ as an identifier because it knows it is based on the same value as $C'$, which itself is based on the server's entropy $e'$.

If we do not care about disclosing whether the $mod\ 2^{128}$ operation caused $2^{128}$ to be subtracted from $e+e'$, we could just inform the server about whether the subtraction occurred or not. Then, the client would provide a signature $\sigma$ (such as a simple Schnorr signature) proving knowledge of the private key $(H_{252}(s)-b)\ mod\ \ell$ for either the public key $C''-C'$ or the public key $C''-C'-2^{128}G$ on the base point $H$.

The reason this signature proves that the commitments are to the same value is that unless $C''-C'$ or $C''-C'-2^{128}G$ causes the values on $G$ to cancel each other out, the discrete log w.r.t $H$ is unknowable.

We can avoid disclosing whether the $mod\ 2^{128}$ operation caused $2^{128}$ to be subtracted from $e+e'$, by using a ring signature. We would prove that we know the private key on the base point $H$ for either $C''-C'$ or $C''-C'-2^{128}G$, but the ring signature will not disclose which.

The client sends $C''$ and the signature or ring signature to the server.

The server verifies the signature, and then records in its database that the identifier $H_{128}(C'')$ will be allowed.

The client now only needs to remember the seed $s$, which will be the hash part of the URL used to access the encrypted resource. The client will reconstruct the identifier $H_{128}(sG+H_{252}(s)H)$, and will encrypt data using a symmetric key derived from $s$ (using a derivation technique such as HKDF-extract).

- +0 – I appreciate that this scheme also prevents offline brute forcing of the client entropy. However, it should be noted that s is not uniformly distributed, because it is the sum of two uniformly distributed variables. Is the security better or equal to a 127 bit uniformly random key? It would be easy to add some more bits of entropy if needed. — Jul 19, 2022 at 10:32
- +0 – @lukasl that's a really good point. I have asked the question here: crypto.stackexchange.com/questions/101077/… — Jul 19, 2022 at 14:44
- +0 – @lukasl I realised that it's possible to do $(e+e')\ mod\ 2^{128}$ and preserve the uniform distribution of $s$. I've updated the answer. — Jul 19, 2022 at 17:52