- cryptanalysis random-number-generator reference-request lfsr
- Updated Sat, 25 Jun 2022 10:01:52 GMT

Looking at the use of Linear Feedback Shift Registers in cryptographic algorithms, I have learned that the Berlekamp-Massey algorithm can be used to find the (shortest) LFSR that generates a given sequence.

But I am not sure if I have really understood things correctly.

Since an LFSR function always returns a copy of its (current) state, one only has to collect enough outputs to be able to construct an LFSR that produces the same binary sequence like the attacked LFSR produces. In a sense this could be interpreted as "collect enough samples/output" to be able to create a function that allows to reconstruct the same binary sequence. (Hope that's correct so far?)

Now, what I am unsure about:

**correct LFSR output sampling for Berlekamp-Massey**To be able to successfully apply the Berlekamp-Massey algorithm, does it matter if the collected samples/output are collected

*"in sequence"*, or is it enough to do*"random sampling"*?Differently asked (to ensure you get what I'm trying to ask), which one of the following is correct:

- Start by fetching a random output and continue collecting several follow-up outputs until enough outputs/samples have been collected. (what I called
*"in sequence"*) - Skipping/missing a few outputs during sampling does not represent a problem, because it does not have to be a series of follow-up outputs; as long as the total number of collected random samples/outputs suffices. (what I called
*"random sampling"*)

- Start by fetching a random output and continue collecting several follow-up outputs until enough outputs/samples have been collected. (what I called
**expected info to be able to calculate the minimum number of needed samples**I've seen some formulas which seemed to point at the calculation, but I do not yet understand them well enough to deduct an answer to the following: Can the minimum number of to-be-collected samples be deducted from a single LFSR sample/output, or would one need more than a single sample/output to calculate the the minimum number of to-be-collected samples/outputs (which would be needed to successfully apply the Berlekamp-Massey algorithm)?

Please don't misinterpret my questions. I'm "doing my research" and I have been reading a whole slew of academic papers that I found in relation to Berlekamp-Massey, but that doesn't mean I'm confident in the fact that I am interpreting those papers correctly. Also, search engines aren't always the best place to find good references. If you know about a specific paper that I should definitely be reading (in relation to Berlekamp-Massey and related LFSR attacks/breaks), I would be grateful if you'ld point me to it.

Berlekamp-Massey is designed for the situation where you have observed $2n$ *consecutive* output bits from a $n$-bit LFSR. It doesn't work if the observed bits are scattered randomly, at random non-contiguous offsets in the stream.

Information-theoretically, a minimum of $2n$ bits of output are needed to reconstruct the LFSR. Intuitively, this is because there are $2n$ bits of unknowns: the unknown $n$-bit state of the LFSR, and the unknown $n$ bits on the taps (the feedback polynomial). Therefore, Berlekamp-Massey exactly meets the known lower bounds, in the case where you get to observe consecutive output bits.

Of course, you could also have used linear algebra to solve for these $2n$ unknowns, but Berlekamp-Massey is faster than solving a system of linear equations ($O(n^2)$ instead of $O(n^3)$, or something like that).

The number of observations you need depends only on the size of the LFSR, not on the output bits. If you do know the size of the LFSR, you don't need to see any output bits to predict how much output will be needed. If you don't know the size of the LFSR, seeing a few output bits doesn't help you predict the size of the LFSR and doesn't help you predict how much output will be needed.

One cool thing about Berlekamp-Massey, though, is that you don't need to know in advance what the size of the LFSR is (what $n$ is). You can just start running Berlekamp-Massey on the output stream, and it will generate candidates for the LFSR initial-state and feedback polynomial. After each bit of output, it will update its candidate. Once you've observed $2n$ bits of output, you are guaranteed that its candidate is correct. If you don't know what $n$ is, you can take each candidate and test it against the next few output bits; this will let you recognize when Berlekamp-Massey has finally found the right candidate, i.e., when you've observed enough output stream (it'll let you recognize when you've hit the $2n$ limit, even though you don't know $n$ a priori).

External links referenced by this document: