Cryptography
aes file-encryption gcm
Updated Sat, 21 May 2022 21:31:41 GMT

Encryption of big files in Java with AES/GCM


I have to encrypt big files. Say their size ranges from 500mb to several of gigabytes.

I would like to use AES/GCM/NoPadding as provided by Java 1.8 since that gives me automatic authentication and encryption.

I would like to use the handy Cipher Input/Output Stream cause I can chain it with GZip I/O Streams to compress the data a little before encryption.

However I was reading that the implementation of Java appends the authentication tag at the end of the stream. That means that for a long file, if I were using CipherInputStream to decrypt it, it wont be able to tell whether the contents have been tampered or not until it reaches the end of the stream, right?

If that were the case, wouldn't it be problematic to actually use that operation mode for what I'm trying to accomplish becuase since the file can't be decrypted in memory it will have to be decrypted somewhere in the filesystem until the failure is detected, leaving some time for an attacker to see the plain text?

Is this a potential thread and a real concern that I should be worrying about or there's something about the algorithm / cipher streams that I'm missing and prevents that from ocurring?




Solution

There is nothing in the GCM cipher that prevents it's use it in streaming mode. You should however not use the resulting plaintext during decryption for anything that requires security before you have verified the authentication tag.

The authentication tag is not to prevent you from decrypting the ciphertext. It is there to provide for integrity and authenticity. You should never decrypt where an attacker can see the plaintext. If possible, you should even try and make it hard for an attacker to perform side channel analysis.

Note that GCM is bounded to encrypting about 68 GB ($2^{39} - 256$ in bits) of data for a single IV. The amount of invocations is $2^{32}$ but you should be advised to stay well away from those limits. Note that repeating the IV for two separate encryption invocations is a catastrophic event for GCM.


CipherInputStream in general is horrible. I would suggest to reprogram it using Cipher and memory mapped files and ByteBuffer itself. The Java implementation (where the tag is automatically put at the end) and CipherInputStream make for this horrible buffering mess.

I'm rewriting the Bouncy Castle implementation and I see a code & complexity reduction of about 30% when I separate the tag from the decryption, plus it enables to decrypt each byte separately. In other words it restores the online properties of the underlying CTR cipher.

With Java 8 however you may want to stick to the Java 8 implementation as GCM may be sped up using intrinsics (for the server VM on the latest Intel processors). Note that according to archie below this functionality is not yet present.





Comments (5)

  • +0 – Beware that the tag size matters with GCM. You could think of EAX or HMAC based authentication if you don't want to look into the security of GCM. — Nov 19, 2014 at 10:36  
  • +3 – Bear in mind that the JDK 8 GCM implementation has crippled performance due to a naive (no HW or even tables) multiplier implementation. It also buffers plaintext during decryption in memory, so is unusable for large files. Hardware acceleration and/or other multiplier improvements are apparently in the works, but not there as of 1.8.0_25. — Nov 19, 2014 at 21:16  
  • +0 – @archie Thanks for the warning. Note that you can always place the BC provider on top of the list of providers and have that take over, or make the provider configurable. I'll take another look at GCM as in Java 8, also because of a few updates regarding bugs with CipherInputStream. This is brand new functionality and is still a bit in movement. — Nov 19, 2014 at 23:20  
  • +0 – @owlstead When you say that GCM is bound to ~68 GB, and that keys should be changed, my understanding is that it means to 68 GB in total, not necessarily on the same file. Is that the correct understanding? — Nov 21, 2014 at 00:13  
  • +1 – @alejo: ~68 GB per IV ... so basically it's a per file limitation since one should always use a new truly random IV per file — Jun 02, 2016 at 18:22