Sanity testing drives

Thursday, 12 August 2010

Recently I came across a situation where moving a lot of data around on a machine with a 3Ware RAID card ultimately killed the machine.

To test the hardware in advance for this requires a test of both:

  • The individual drives, which make up the RAID array
  • The filesystem which is layered upon the top of it.

The former can be done with badblocks, etc. The latter requires a simple tool to create a bunch of huge files with "random" contents, then later verify they have the contents you expected.

With that in mind:

dt --files=1000  --size=100M [--no-delete|--delete]

This:

  • Creates, in turn, 1000 files.
  • Each created file will be 100Mb long.
  • Each created file will have random contents written to it, and be closed.
  • Once closed the file will be re-opened and the MD5sum computed
    • Both in my code and by calling /usr/bin/md5sum.
    • If these sums mis-match, indicating a data-error, we abort.
  • Otherwise we delete the file and move on.

Adding "--no-delete" and "--files=100000" allows you to continue testing until your drive is full and you've tested every part of the filesystem.

Trivial toy, or possibly useful to sanity-check a filesystem? You decide. Or just:

hg clone http://dt.repository.steve.org.uk/

(dt == disk test)

ObQuote: "Stand back boy! This calls for divine intervention! " - "Brain Dead"

| 4 comments.

 

Comments On This Entry

[gravitar] Nux

Submitted at 10:22:52 on 13 august 2010

At my former job we used to stress test all servers before releasing to customers. We used to test them with "stress" (just apt-get install it) which I think can also generate big files. Just FYI; maybe it helps (someone).

[gravitar] Nick J

Submitted at 02:35:04 on 16 august 2010

Just wondering if the ideal place for this tool is the System Rescue CD (since I tend to use that to sanity check hardware before trusting it). Being able to boot from a special-purpose CD, before installing/configuring any software, to test the hardware works as expected before trusting it with anything, is quite nice.

I guess this would ideally work like so:
* run badblocks on each drive
* Create RAID array (using something like mdadm for software raid array, or reboot & enter BIOS for hardware RAID)
* create new filesystem on RAID array from bootable ISO.
* Mount RAID array
* Run this tool on that mounted filesystem.

So, just thinking aloud, I guess that would require:
* getting a binary of dt added to sysrescuecd (with all the external dependencies met by whatever ships on that CD).
* Maybe a parameter something like "--fill-filesystem" (instead of fudging with "--files=100000" or whatever), that would fill the filesystem, and potentially would then ideally clean up after itself by removing the created files if all tests were passed & the whole filesystem was tested and found to be good.
* Maybe an optional parameter to say which directory to use for the test, if not the current directory (in the case of having booted off of a ISO-image, the default current directory would almost probably not be the one that needs testing).

Anyway, just some quick thoughts after reading your post, please ignore if they're not helpful to you!

[author] Steve Kemp

Submitted at 22:53:45 on 16 august 2010

Nick; your thoughts are very good, and your "ideally working" scenario is almost exactly what I had in mind.

The only difference for me is that I'd netboot a system via PXE and have it load a remote filesystem over NFS. No need to get the binary added to RescueCD - just add it to /usr/local/bin in my environment and suddenly my systems all get it. Without even needing a reboot!

I like the idea of a --directory and --full/--fill command line options and will add them shortly.

For the moment I just use "--files=1000000 --size=1024Mb" which is bigger than my drives. Sure the last one will will - but it does excercise the disk before then..

[gravitar] Kamil Kisiel

Submitted at 00:40:04 on 18 august 2010

Cool little program Steve. We do much the same thing using some bash scripts that dd from /dev/urandom and compare the hash using md5sum.

 

Comments are closed on posts which are more than ten days old.

Recent Posts

Recent Tags