[ome-users] Storage issues

Roger Leigh rleigh at codelibre.net
Tue Sep 4 20:17:30 BST 2018


On 04/09/18 19:49, Benjamin Schmid wrote:
> Dear all,
> 
> This question is not really related to OMERO, but maybe some of you have 
> come across this before:
> 
> We have a Thecus storage system (N16000pro) that's configured as a RAID 
> 6 and connected via iSCSI to a machine that runs an OMERO server. The 
> Thecus system provides two LUNs (it's 2 because the maximum size of a 
> LUN is 16 TB). They show up on the server as 2 partitions, /dev/sdc1 and 
> /dev/sdd1. LVM2 is used to combine the 2 partitions into one logical 
> volume (/dev/vg0/lv0).
> 
> When I tried to reboot the server today, the logical volume wasn't 
> mounted and syslog shows lots of scary error messages:
> ---
> Sep  4 13:50:43 romulus kernel: [   20.964830] sd 8:0:0:1: [sdc] tag#0 
> FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Sep  4 13:50:43 romulus kernel: [   20.964842] sd 8:0:0:1: [sdc] tag#0 
> Sense Key : Not Ready [current]
> Sep  4 13:50:43 romulus kernel: [   20.964847] sd 8:0:0:1: [sdc] tag#0 
> Add. Sense: Logical unit communication failure

For all these errors, it looks like the sdc iSCSI target device is the 
source.  It's not an LVM problem, it's the underlying physical volume 
(PV) device not responding.  For that reason, I'd be wary of running 
fsck again until you resolve the underlying (virtual) hardware problem, 
particularly when you're striping over the two targets--you wouldn't 
want it to destructively modify anything when it's not working properly.

The cause isn't clear, and will need some investigation.  Suggestions 
would include:

- the iSCSI configuration on the OMERO server machine
- the network (e.g. is it dropping packets or badly contended, leading 
to communication failure or timeouts?  Is anything else saturating the 
network?  Is the patch cable faulty?)
- the storage system itself; not familiar with Thecus, can you force a 
parity check of the whole array, and/or read the content of the exported 
iSCSI target to verify it is all readable, e.g. with dd?  Is there a 
management front-end to do check the status and verify the parity?
   [if it's using Linux mdraid, "cat /proc/mdstat" to check the status, 
and "echo check > /sys/block/mdN/md/sync_action" to force a parity check 
of the whole array.  Both should be non-destructive.  But it would be 
wise to check the Thecus documentation before doing anything further.]


Regards,
Roger


More information about the ome-users mailing list