Cryptography
encryption aes gcm file-encryption scrypt
Updated Wed, 08 Jun 2022 01:42:54 GMT

# Is this correct/incorrect password for file encryption scheme secure?

I'm writing an AES file encryption program, and I'd like to put in a way to tell whether or not the user has entered the correct password without decrypting the entire file and GCM telling me the tag is invalid.

My process is as follows:

• Get the user to enter a password ($$p$$), generate a salt/nonce/IV ($$n$$), and use Scrypt to generate 2 keys (The first and second half of the generated key); $$k_1, k_2 = Scrypt(salt:n, keylen:32, n:2^{16}, r:8, p:1).derive(p)$$

• Encrypt the data of a file with $$\text{AES-GCM-128(}k_1, \text{iv/nonce=n})$$ and empty associated data.

• Write encrypted data to the file so that the contents of the file is $$n || gcmtag || data$$

Would it be secure if I instead wrote the following to the file: $$n || gcmtag || k_2 || data$$

That means I can load $$n$$ from the file, take the user-inputted password, derive the keys, and check if the value of $$k_2$$ is equivalent to the $$k_2$$ loaded from the file.

## Solution

Your $$k_2$$ value is functioning effectively the same way as conventional password verification methods, where you store a salted password hash of the users' passwords. So it allows for an adversary to test password guesses, but

1. So does the authenticated GCM $$(c, tag)$$ pair;
2. The memory hard scrypt function is your main line of defense against this attack anyway.

Alternative to very strongly consider: instead of encrypting the whole file in one GCM encryption call, split it into chunks to be encrypted separately with some construction that protects against reordering, deletion and truncation. Study these examples:

The main reason for this is that way you can encrypt/decrypt very large files with a fixed memory footprint, and yet abort decryption as soon as you hit an inauthentic chunk. And secondarily to this, it also indirectly tackles your problem: if the user enters the wrong password, then decryption will fail on the first block.

Potential downsides are:

• Encryption produces larger files by some percentage of the plaintext size (and not just a fixed overhead like your proposal);
• If an encrypted file is adulterated later than the first block you may output a prefix of the plaintext before you abort, which could be a problem in some applications (or not).