System Administration & Network Administration
filesystems lvm ext4 btrfs zfsonlinux
Updated Sun, 07 Aug 2022 01:38:44 GMT

Which filesystem for large LVM of disks (8 TB)?


I have a Linux server with many 2 TB disks, all currently in a LVM resulting in about 10 TB of space. I use all this space on an ext4 partition, and currently have about 8,8 TB of data.

Problem is, I often get errors on my disks, and even if I replace (that is to say, I copy the old disk to a new one with dd then i put the new one in the server) them as soon as errors appear, I often get about 100 MB of corrupted data on it. That makes e2fsck go crazy everytime, and it often takes a week to get the ext4 filesystem in a sane state again.

So the question is : What would you recommend me to use as a filesystem on my LVM ? Or what would you recommend me to do instead (I don't really need the LVM) ?

Profile of my filesystem :

  • many folder of different total sizes (some totalling 2 TB, some totalling 100 MB)
  • almost 200,000 files with different sizes (3/4 of them about 10 MB, 1/4 between 100 MB and 4 GB; I can't currently get more statistics on files as my ext4 partition is completely wrecked up for some days)
  • many reads but few writes
  • and I need fault tolerance (I stopped using mdadm RAID because it doesn't like having ONE error on the whole disk, and I sometimes have failing disks, that I replace as soon as I can, but that means I can get corrupted data on my filesystem)

The major problem are failing disks; I can lose some files, but I can't afford lose everything at the same time.

If I continue to use ext4, I heard that I should best try to make smaller filesystems and "merge" them somehow, but I don't know how.

I heard btrfs would be nice, but I can't find any clue as to how it manages losing a part of a disk (or a whole disk), when data is NOT replicated (mkfs.btrfs -d single ?).

Any advice on the question will be welcome, thanks in advance !




Solution

It's not file system problem, it's disks' physical limitations. Here's some data:

SATA drives are commonly specified with an unrecoverable read error rate (URE) of 10^14. That means that 1 byte per 12TB will be unrecoverably lost even if disks work fine.

This means that with no RAID you will lose data even if no drive fails - RAID is your only option.

If you choose RAID5 (total capacity n-1, where n = number of disks) it's still not enough. With 10TB RAID5 consisting of 6 x 2TB HDD you will have a 20% chance of one drive failure per year and with a single disk failing, due to URE you'll have 50% chance of successfully rebuilding RAID5 and recovering 100% of your data.

Basically with the high capacity of disks and relatively high URE you need RAID6 to be secure even again single disk failure.

Read this: http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162





Comments (5)

  • +3 – Wait, URE means Unrecoverable Read Error but this doesn't mean that the disk actually HAS the error. The next read may (and probably will) return the correct bit. The OS will probably just re-read the sector and obtain the correct data. You also forgot to talk about S.M.A.R.T.: before a sector is permanently damaged, S.M.A.R.T. will try to read/write data from/to it. If it detects too many failures, S.M.A.R.T. simply moves the content of the sector in another place and marks the sector as BAD and nobody will be able to write onto it again. — Oct 09, 2012 at 13:17  
  • +0 – So, you are simply suggesting to buy tons of disks without asking WHY his disks are so faulty. It could be a heat problem, it could be a problem with a faulty SATA controller, it could be a problem of bad SATA connectors, etc. etc. etc. — Oct 09, 2012 at 13:20  
  • +0 – @Avio What I'm saying is that with 10TB of data you will have read errors due to hard disk limitations, even if all disks, SATA controller, SATA connectors etc are in perfect condition and working according to specs. I am also saying that even if you decide to use RAID to mitigate that you should go with RAID6 because disk capacity + URE make even RAID5 not reliable enough. Even single drive failure on RAID5 has a high (50% FFS!) data loss chance. — Oct 09, 2012 at 13:28  
  • +1 – @Avio U in URE stands for Unrecoverable as in gone for good. — Oct 09, 2012 at 13:37  
  • +0 – It can be the file systems problem, if you use a copy on write filesystem like btrfs or xfs you can very likely recover a previous version of the file, so only loosing the last change to the file. (if it was ever changed) — Oct 12, 2012 at 14:21  


External Links

External links referenced by this document: