Information Security
confidentiality programming timing-attack side-channel
Updated Fri, 08 Jul 2022 00:30:50 GMT

Timing Safe String Comparison - Avoiding Length Leak

Let's say that we're building a generic timing-safe comparison function for general purpose use. Making it so that it is safe when both strings are equal length is pretty well known. However, what I'm not sure about is how we can make it safe if the strings are different lengths (arbitrary).

One string will be deemed as "known", and the other as "unknown". We'll assume that an attacker only has control of the "unknown" value. Ideally, this function should leak no information about the length of the known string.

A trivial implementation, such as:

// returns 1 is same, 0 otherwise
int timing_safe_compare(char *known, size_t known_len, char *unknown, size_t unknown_len) {
    // Safe since all strings **will** be null terminated
    size_t mod_len = known_len + 1;
    size_t i;
    int result = 0;
    result = known_len - unknown_len;
    for (i = 0; i < unknown_len; i++) {
        result |= known[i % mod_len] ^ unknown[i];
    return result == 0 ? 1 : 0;

The problem here though is that there may be cache information leak.

For example, a word size in x64 is 64 bits. So we can fit 8 characters in a single register. If the known value is a string that's 7 characters or less (since we add 1 to the known_len), the comparison never requires another load operation for the known string, even though the unknown string will.

In other words, if the size of the unknown string differs from the known string by one or more word boundaries, the total amount of "work" being done may change.

My first instinct would be to only compare strings of equal sizes, but then length information would be leaked (as execution would follow several different branches).

So, this leaves two questions:

  1. Is this small difference enough to be concerned about?

  2. Can this sort of difference be prevented without leaking information about the known size at all?


Being able to process strings of arbitrary length without leaking information on their length seems to be very hard (i.e. I don't see how to do it) because of caches. A very long string, by definition, will take a lot of room, and thus reading the string will incur interaction with the caches. Accessing the string from RAM will trigger cache misses, and also evict other data elements from the caches, impacting the future behaviour of the application code. A cache miss costs dozens or even hundreds of clock cycles: it is at least ten times more visible, from the outside, than a branch misprediction. If you worry about branches, then you should worry a lot more about caches.

However, we can cheat with padding. Assume that you could arrange for the two strings you want to compare to be written at the start of two big, equal-sized buffers full of zeros; also, we suppose that a byte of value 0 cannot appear in a normal string (e.g. these are C strings). Then all you need is to make a leak-free comparison of the two buffers, who have the same length. The buffer length will leak, but it is a fixed, constant and publicly known parameter, so that's not a problem.

This does not solves the problem; it moves it. Now, you have to make sure that whatever produced the strings could write them in the buffers without leaking size information. Generally speaking, you no longer have strings. You have binary values of a given fixed length that you copy with a big memcpy(); these values just happen to have a string interpretation in which the bytes are considered to be characters, up to the first byte of value zero.

From a higher point of view, having a "safe string comparison function" is like bringing a bucket aboard the Titanic. If your code is handling secret data, then everything you do with the data is potentially subject to timing attacks. In general, your application can be of two sorts:

  • If the only secret part is a single cryptographic element and everything else is public, then using a few leak-free primitives makes sense, and will improve overall security. A classic example is a Certification Authority, where the only secret part is the CA private key; as long as the signature algorithm does not leak secrets, the whole system is robust against timing attacks. Similarly, a Web site which does password-based authentication but otherwise contains only public data will be fine.

  • If the secrecy is spread throughout the system, such as a Web site which does password-based authentication to give access to some confidential data, then concentrating on the string comparison misses the point. The whole server code must be made leak-free, and that is a considerably more difficult endeavour (and we don't really know how to do it).

In any case, trying to protect any given piece of code against side-channel attacks becomes harder when the language is more "high-level". A language such as PHP, with its automatic memory management (the garbage collector) and string management (string are values just like integers) will not help at all. That's the reason why low-level primitives implemented in C (such as a leak-free string comparison function) must be provided, but the issue is much larger and encompasses a lot of PHP code as well.

Comments (1)

  • +1 – So, would something as an analog to the above (even with its potential length leak) be sufficient and beneficial in your opinion? Seems to be the general trend here. — Feb 05, 2014 at 20:36  

External Links

External links referenced by this document: