Please criticize the (homebrew?) mode described below; to point out a single major defect/uncertainty or the link to (the analysis of) an equivalent construction is quite enough. Assume the performance is ignored in favor of security.
This scheme looks like something that emerges from time to time. Here is yet another place where anyone may discover its description and quickly comprehend why it should not be applied, now or perhaps ever.
There is a system fabricated as part of a f.o.s.s. project, oriented towards a single user (or more precisely, any group of users with the same rights) who has the given file and places it on a publicly accessible storage in encrypted form (it's one of basic operations of the system). For the sake of simplicity, let's suppose the file name to be already a pseudo-random 256-bit string (that's another basic operation). The demands are:
- no one but the user should be able to know the original content of the file and/or make plausible conclusions about it, — except for the time of its last modification (accurate within few hours), the size and derived information — for as much time as possible
- the user should be able to detect any corruption/modification of encrypted form
The following notations are used:
- FILE = (f1;f2;...) — the finite string of 8-bit bytes
- KEY — the 32-byte string, known only to the user
- SALT — the 32-byte string, known to the user, also may be known to everyone in the universe, although «random» for each new file
- SHA256(m) — standard SHA-256 hash of the byte string/message m
- H(m) = SHA256 (SHA256 (0512 || m)) — the SHA-256 modification described in subsection 5.4.2 of Cryptography Engineering (Ferguson, Schneier, Kohno, 2010) and supposed to counteract the length extension and partial-message collision problems. 0512 is 64 zero bytes, m is the actual message, A||B means the concatenation of A and B (bit)strings
The goal is to translate the FILE into the encrypted form EN-FILE and then decrypt it along with integrity verification. The encryption is done in this way:
EN-FILE is placed on the storage. There it may become corrupted, may not; after some time it becomes EN-FILE'. Then it is taken back to the (appropriately isolated) user machine, and the decryption is done locally «in reverse order» (more items due to checks):
Note that the decryption may be done in two passes, the first pass being a check one, where nothing is written to local drive and each block (except for n', n'+1, n'+2) in memory is replaced by the next one as soon as it has been input to the hash.
Being implemented (C++), it works, actually — the files are encrypted and decrypted, manual ciphertext corruptions are detected. What's in question, obviously, is its security in adversarial environment.
This scheme is vulnerable to a "truncation attack", which allows an attacker to forge new ciphertexts (EN-FILEs).
Here's how this works. Assume that the attacker controls a section of the plaintext and can predict (with reasonable probability) the plaintext prior to that section. In another words, a value $A \| B \| C$ is encrypted, where $A$ is predictable and $B$ is attacker-controlled. The attacker can choose some arbitrary value $X$, determine the padding string $P$ needed to pad $A \| X$, and compute $\operatorname{D-HASH} = H(A \| X \| P)$. The attacker submits $B = X \| P \| \operatorname{D-HASH}$, which will cause $A \| X \| P \| \operatorname{D-HASH} \| C$ to be encrypted.
Assume the attacker can now see and modify the resulting EN-FILE. The attacker knows the number of ciphertext blocks corresponding to $A \| X \| P \| \operatorname{D-HASH}$, which we will call $N$. The first $N$ blocks of EN-FILE is itself a valid EN-FILE: if these $N$ blocks were submitted to the decryption algorithm, they would be successfully decrypted as $A \| X$. This is a break of the scheme, since authenticated encryption must prevent ciphertext forgeries.
This scheme is also vulnerable to a padding oracle attack, since the padding is checked before D-HASH, which can allow an attacker to extract parts of (or information about) the plaintext.