This is a tale of woe on Centos 6.
Resetting drives. iuCRC errors. Bad blocks in ZFS. These are not things you want to see from your brand new 5000$ storage machine.
We just finished setting up the first storage server for my group at work. I ran badblocks on the drives as an acceptance test, and started noticing that the HBA, a LSI 9207-8i, would randomly freeze I/O complaining that the drives had timed out an operation, then eventually resend the command. Reducing the NCQ depth and increasing the command timeout did not help. The errors were intermittent but consistent on all drives. Badblocks passed, with no reallocated sectors so the disks platters should be fine.
Turns out, using mpt2sas driver version 16 and P20 firmware on the HBA does not work. I had to downgrade the firmware to P16, at which point things seemed to start working again. Downgrading required booting into the UEFI shell provided by the motherboard, and was actually not a bad experience. I like the idea of a recovery shell built in – no more digging around trying to find a live dos USB image that works.
This also demonstrates ZFS might be a good burn in tool… it detected silent data corruption the HBA likely caused.