Cryptography
hash key-derivation bcrypt scrypt argon2
Updated Sat, 04 Jun 2022 21:31:15 GMT

Looking at hash output – is Base64 encoding in any way better than HEX encoding?


I was wondering why most "normal/unsafe" crypto hashes like SHA-256, SHA-512, Whirlpool, RipeMD-160, MD5, etc. are HEX encoded.

But most "secure" crypto hashes (KDF' ) like bcrypt and scrypt are Base64 encoded. Why?

somewhere I heared that Base64 shortend the string for like 20%. Isn't that extremely bad for password hashed during iterations and makes them less collision resistent?

And if Base64 is really for some reason more secure, then why does Argon2 output HEX encoding?




Solution

The algorithms themselves just output binary (i.e. bytes) if you read their specifications. It's the implementation in API's and applications that output the hexadecimals and/or base64.

Sometimes there are also ad hoc standards / common practice that specifies a certain output format. This is for instance the case for the output of the bcrypt password hashing algorithm. In that case it's not just the hash that is displayed but also the type of algorithm, number of iterations and if course salt.

Base64 is more efficient than hex, while hex is easier for humans to digest. The value of the bytes as well as the amount of bytes are just easier to see in hex; the amount of stored bytes is for instance simply half of the displayed hex digits. However for textual formats or indeed larger hash values base64 may be chosen for its efficiency (~33% overhead for base64 vs 100% for hex).

The command line utilities md5sum, sha1sum and their successors have always kept to outputting hex; it's to be expected that hex is therefore more likely to be output by applications that want to remain compatible.


Note that I've changed the case of the terms "Base64" and "HEX" in this answer to lowercase to be compatible with RFC 4648: The Base16, Base32, and Base64 Data Encodings which tries to standardize the encodings. It only uses the uppercase variant in the title. "Hex" is an abbreviation, not an acronym, so all uppercase does not make sense.

Personally I prefer all uppercase for hexadecimals; people recognize the upper part of letters / digits more easily, so it makes sense to use it as default (and on all my old computers the characters were also in uppercase, so they are in most debuggers).





Comments (5)

  • +0 – Certain kinds of hashes are as likely to be validated by human eyeballs as by machines, while others are intended to be compared solely by machines. If a hash will be processed solely by machines, base64 or base85 would likely be a more efficient choice than hex, but some humans may be more able to look at two hex strings and say whether they "seem" identical than would be possible with base64 [if someone replaced a file with a malicious one chosen so that the first 10%, last 10%, and middle 10% of the hex hash matched even though the rest were random garbage, some people might not notice... — Jul 28, 2017 at 23:11  
  • +0 – ...but if people who receive a file randomly pick part of the hex signature to validate, phony files would likely get caught at by at least some users, who could then sound the alarm for everyone else.] — Jul 28, 2017 at 23:12  
  • +1 – Also worth noting, that if you are putting encoded values into urls, HEX is url-safe, whereas base64 is not - because it uses / and + characters. You can get url-safe versions of base64 which substitute / with _ and + with -. — Jul 29, 2017 at 17:40  
  • +0 – Right, as specified here. — Jul 29, 2017 at 17:42  
  • +0 – It's also easier for programmers to debug crypto code when the "correct" answers are in hex, which corresponds directly to the byte array in question. — Jul 30, 2017 at 00:47  


External Links

External links referenced by this document: