- provable-security stream-cipher random-number-generator semantic-security pseudo-random-generator
- Updated Sat, 21 May 2022 11:14:18 GMT

I think I have some sense of what Perfect Security means, and even Semantic Security, but I am struggling with randomness, so I'm going to ask a question about CSPRG's (Cryptographically Secure PRG's) by going through the first two.

The basic idea that leads to the notion of **Perfect Security**, as I understand it, is that given a ciphertext $c$, and a message $m$, there are a certain number of keys that might turn $m$ into $c$. If someone shows me $c,$ and tells me that keys are chosen uniformly in some space, then a message $m$ that has many keys that will encrypt it into $c$ is far more likely to be the original than a message $m'$ that has few keys that will turn it into $c$. In this way, some probabilistic information might be leaked about the nature of the original message. Perfect secrecy rules out this leak by saying that given a $c$, the number of keys must always be the same regardless of $m$.

**One time "Semantic Security"** codifies this as a game. I give you $m$ and $m'$, you give me $c$, and I try to guess whether it's more likely that you encrypted $m$ or $m'$. If lots of keys would change $m$ into $c$, but only few would change it into $m'$, then I have an advantage in my guess of which one you encrypted.

So now let's suppose we have a **function $G$** that **takes a seed and gives** me **a key**, and I want to use this as a stream cypher, calculating $c = G(s) \oplus m$. Let's play the semantic security game with $m$, $m'$. Once given $c$, the attacker now knows that the key was either $m \oplus c$ or $m' \oplus c$. So the question the attacker will ask himself is:

Are there more seeds $s$, such that $G(s) = m \oplus c$? Or, $G(s) = m' \oplus c$?

If they can answer this question, then they might be able to get an advantage. The question I want to answer now is: **what characteristics does $G$ have to have so that the attacker can not get an advantage?**

Well, suppose it was $m$ that was encrypted, but the attacker doesn't yet know this. Then:

$$s = G^{-1}(m \oplus c)$$

At the same time, the attacker is going to be inspecting $G^{-1}(m' \oplus c)$ to see if it's possible that this could be a seed. Since the key space is the message space is the ciphertext space, and all are much larger than the seed space, $m' \oplus c$ is going to range over things that are not mapped to by $G$. So there are choices of $m'$, even a lot of choices, for which $G^{-1}(m' \oplus c) = \emptyset$. Therefore, we can not hope to find, as we did in the case of perfect security, that the number of seeds is invariant as we change the message.

The only other option I can see is that we must hope that there is no effective means of inverting the function $G$. If there is, then we're lost, as the attacker has an effective way of finding out whether they have a seed or not; and for a lot of choices of $m'$ s/he won't. That will give the attacker a very large advantage.

So, my question is **how we relate this requirement for the lack of an effective inverse, into the usual definition of a Cryptographically Secure PRG?** And **indeed, what is the definition of a CSPRG?** Dan Boneh describes it as a kind of "computational indistiguishability" of two measures (the uniform measure on the key space and the push forward of the uniform measure on the seed space). And he claims that by Yao it's equivalent to a notion of "unpredictability". Could someone hash these three different things together for me? Finally, in this question, D.W. mentions an "asymptotic" (and not very good) definition of a PRG vs. a "concrete security" definition. Could he, or someone, clarify what these are?

And also, does anyone have the reference to Yao?

**First**, on the difference between perfect security and semantic security. Both definitions concern confidentiality, so let us first define what confidentiality means.

Note first that an adversary as some a priori knowledge of the message. We can capture that by e.g. having the adversary choose two messages and then flipping a fair coin to decide which one to encrypt.

Next, note that the adversary wants to learn something from the ciphertext, his a posteriori knowledge of the message. We capture that by having the adversary guess which message was encrypted.

If the adversary cannot guess correctly which message was encrypted with probability significantly different from $1/2$, then we have confidentiality.

That was the definition of *confidentiality*.

(More complex definitions are possible using a probability space and a function from the message space to 0 and 1, but we ignore that since it isn't important for the present discussion.)

*Perfect security* is about confidentiality against arbitrary adversaries. Now we can prove many theorems about perfect security, and one of them talks about the number of keys taking a given message to a given ciphertext (as you describe).

*Semantic security* is about confidentiality against computationally bounded adversaries. Unlike for perfect security, there aren't so many theorems we can prove about semantic security, which means that we know less about what is required to achieve semantic security.

So the difference between perfect and semantic security is just about which adversaries we consider, nothing else. The two notions are often presented in a superficially different way, however, making it look like they are different in other respects.

Regarding computationally bounded adversaries. The traditional approach (based on computational complexity theory) is to speak about families of cryptosystems with increasing key lengths, and let the adversary's effort be bounded by some polynomial in the key length. The concrete security approach says that we have a concrete cryptosystem with a fixed key length, and let the adversary's effort be bounded by a fixed number of computational steps (probably relative to some fixed universal Turing machine or something, there are some foundational problems here).

**Second**, on pseudo-random generators. A *pseudo-random generator* is a function that "expands" a short string into a long string. As above, we define first a security goal similar to confidentiality, and then we can consider different adversaries against that goal.

A short string is sampled and expanded into a long string. A second long string is sampled. The adversary gets either the first or the second long string, decided by a fair coin flip.

If the adversary cannot guess correctly which of the two strings he got with probability significantly different from $1/2$, then we have security.

(This is equivalent to Boneh's notion of indistinguishable probability distributions.)

*Perfect security* considers arbitrary adversaries. Using the arguments you outline in your question, it is fairly easy to show that perfect security for a pseudo-random generator is impossible.

*Computational security* considers computationally bounded adversaries. Obviously, here there's no proof of impossibility. Instead, we have good reason to believe that this is possible to achieve for useful values of "computationally bounded".

Since perfect security is impossible, there's no point in going on about "computational" all the time, so we usually ignore it and talk about "secure pseudo-random generators". Since pseudo-random generators are used in many fields of science, we often add "cryptographically" in front to say clearly that we are talking about cryptography here, and not (say) statistics.

A pseudo-random generator is a function $G$ from the set of short strings to the set of long strings. If you can invert the function (or find preimages if it is not injective), then the generator is obviously not secure, as you observe.

**Third**, on *unpredictability*. It is clear that if I sample a long string and show you a prefix, then you cannot say anything about the remainder of the string, specifically not about the next symbol.

Suppose that if you are given a prefix of a pseudo-random generator's output, you can predict the next symbol of the output. In the setting described above, where you either get the generator's output or a random string, you can recognize the output of the pseudo-random generator.

Here's how you recognize the output: You get a long string. Momentarily, you forget all but a prefix of the string and pretend that it is a prefix of something output by the generator (which it may be). Now you use your ability to predict the next symbol. Then you remember the entire long string, specifically the next symbol. If you predicted it correctly, then probably the long string was output by the generator. If your prediction was incorrect, then probably the long string was random.

Since you can recognize the generator's output, the generator is not secure. It follows that if you have a secure generator, it is unpredictable. Which is nice.

(The other direction is more complicated. Basically, the idea is that if given a prefix, you cannot say anything about what the next symbol will be, then you cannot distinguish the generator's output from the generator's output with the final symbol randomized. But you cannot distinguish the latter from the generator's output with the final two symbols randomized. Etc. Since indistinguishability is transitive, it follows that you cannot distinguish the generator's output from random strings.)