I have to encrypt big files. Say their size ranges from 500mb to several of gigabytes.
I would like to use AES/GCM/NoPadding as provided by Java 1.8 since that gives me automatic authentication and encryption.
I would like to use the handy Cipher Input/Output Stream cause I can chain it with GZip I/O Streams to compress the data a little before encryption.
However I was reading that the implementation of Java appends the authentication tag at the end of the stream. That means that for a long file, if I were using CipherInputStream to decrypt it, it wont be able to tell whether the contents have been tampered or not until it reaches the end of the stream, right?
If that were the case, wouldn't it be problematic to actually use that operation mode for what I'm trying to accomplish becuase since the file can't be decrypted in memory it will have to be decrypted somewhere in the filesystem until the failure is detected, leaving some time for an attacker to see the plain text?
Is this a potential thread and a real concern that I should be worrying about or there's something about the algorithm / cipher streams that I'm missing and prevents that from ocurring?
There is nothing in the GCM cipher that prevents it's use it in streaming mode. You should however not use the resulting plaintext during decryption for anything that requires security before you have verified the authentication tag.
The authentication tag is not to prevent you from decrypting the ciphertext. It is there to provide for integrity and authenticity. You should never decrypt where an attacker can see the plaintext. If possible, you should even try and make it hard for an attacker to perform side channel analysis.
Note that GCM is bounded to encrypting about 68 GB ($2^{39} - 256$ in bits) of data for a single IV. The amount of invocations is $2^{32}$ but you should be advised to stay well away from those limits. Note that repeating the IV for two separate encryption invocations is a catastrophic event for GCM.
CipherInputStream
in general is horrible. I would suggest to reprogram it using Cipher
and memory mapped files and ByteBuffer
itself. The Java implementation (where the tag is automatically put at the end) and CipherInputStream
make for this horrible buffering mess.
I'm rewriting the Bouncy Castle implementation and I see a code & complexity reduction of about 30% when I separate the tag from the decryption, plus it enables to decrypt each byte separately. In other words it restores the online properties of the underlying CTR cipher.
With Java 8 however you may want to stick to the Java 8 implementation as GCM may be sped up using intrinsics (for the server VM on the latest Intel processors). Note that according to archie below this functionality is not yet present.
CipherInputStream
. This is brand new functionality and is still a bit in movement. — Nov 19, 2014 at 23:20 External links referenced by this document: