Unix & Linux
command-line command sort
Updated Mon, 12 Sep 2022 04:21:13 GMT

"sort -h" not working correctly


I tried sorting this file :

1000000911M
1G
2G
1.5G
100M

but using sort -h file I got :

100M
1000000911M
1G
1.5G
2G

Is there a problem with sort? If so how to avoid that without having to explicitly expand M,G ?




Solution

This is documented in the man page for sort:

-h, --human-numeric-sort, --sort=human-numeric
         Sort by numerical value, but take into account the SI suffix, if present.
         Sort first by numeric sign (negative, zero, or positive); then by SI
         suffix (either empty, or `k' or `K', or one of `MGTPEZY', in that order);
         and finally by numeric value.  The SI suffix must immediately follow the
         number.  For example, '12345K' sorts before '1M', because M is "larger"
         than K.  This sort option is useful for sorting the output of a single
         invocation of 'df' command with -h or -H options (human-readable).

The relevant part is the fourth sentence:

For example, '12345K' sorts before '1M', because M is "larger" than K.

This is what you observed between your 1000000911M line and the 1G and 2G lines that follow it.

In ordinary practice, the software that generates output with these kinds of suffixes will switch suffix rather than output so many significant digits.





Comments (2)

  • +2 – An example like numfmt --from=iec --to=iec <infile | sort -h might help. — Jul 25, 2022 at 14:36  
  • +0 – @QuartzCristal Tip of the hat for the numfmt pointer. — Aug 14, 2022 at 08:36