Programming
python text-files line-count
Updated Tue, 05 Jul 2022 19:33:16 GMT

How to get line count of a large file cheaply in Python?


How do I get a line count of a large file in the most memory- and time-efficient manner?

def file_len(filename):
    with open(filename) as f:
        for i, _ in enumerate(f):
            pass
    return i + 1



Solution

You can't get any better than that.

After all, any solution will have to read the entire file, figure out how many \n you have, and return that result.

Do you have a better way of doing that without reading the entire file? Not sure... The best solution will always be I/O-bound, best you can do is make sure you don't use unnecessary memory, but it looks like you have that covered.





Comments (5)

  • +8 – Exactly, even WC is reading through the file, but in C and it's probably pretty optimized. — May 10, 2009 at 10:38  
  • +7 – As far as I understand the Python file IO is done through C as well. docs.python.org/library/stdtypes.html#file-objects — May 10, 2009 at 10:41  
  • +0 – @Tomalak That's a red herring. While python and wc might be issuing the same syscalls, python has opcode dispatch overhead that wc doesn't have. — Jan 11, 2013 at 22:53  
  • +4 – You can approximate a line count by sampling. It can be thousands of times faster. See: documentroot.com/2011/02/… — Jun 14, 2016 at 20:30  
  • +5 – Other answers seem to indicate this categorical answer is wrong, and should therefore be deleted rather than kept as accepted. — Jan 25, 2017 at 13:59