[ome-users] Storage issues
Roger Leigh
rleigh at codelibre.net
Tue Sep 4 20:17:30 BST 2018
On 04/09/18 19:49, Benjamin Schmid wrote:
> Dear all,
>
> This question is not really related to OMERO, but maybe some of you have
> come across this before:
>
> We have a Thecus storage system (N16000pro) that's configured as a RAID
> 6 and connected via iSCSI to a machine that runs an OMERO server. The
> Thecus system provides two LUNs (it's 2 because the maximum size of a
> LUN is 16 TB). They show up on the server as 2 partitions, /dev/sdc1 and
> /dev/sdd1. LVM2 is used to combine the 2 partitions into one logical
> volume (/dev/vg0/lv0).
>
> When I tried to reboot the server today, the logical volume wasn't
> mounted and syslog shows lots of scary error messages:
> ---
> Sep 4 13:50:43 romulus kernel: [ 20.964830] sd 8:0:0:1: [sdc] tag#0
> FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Sep 4 13:50:43 romulus kernel: [ 20.964842] sd 8:0:0:1: [sdc] tag#0
> Sense Key : Not Ready [current]
> Sep 4 13:50:43 romulus kernel: [ 20.964847] sd 8:0:0:1: [sdc] tag#0
> Add. Sense: Logical unit communication failure
For all these errors, it looks like the sdc iSCSI target device is the
source. It's not an LVM problem, it's the underlying physical volume
(PV) device not responding. For that reason, I'd be wary of running
fsck again until you resolve the underlying (virtual) hardware problem,
particularly when you're striping over the two targets--you wouldn't
want it to destructively modify anything when it's not working properly.
The cause isn't clear, and will need some investigation. Suggestions
would include:
- the iSCSI configuration on the OMERO server machine
- the network (e.g. is it dropping packets or badly contended, leading
to communication failure or timeouts? Is anything else saturating the
network? Is the patch cable faulty?)
- the storage system itself; not familiar with Thecus, can you force a
parity check of the whole array, and/or read the content of the exported
iSCSI target to verify it is all readable, e.g. with dd? Is there a
management front-end to do check the status and verify the parity?
[if it's using Linux mdraid, "cat /proc/mdstat" to check the status,
and "echo check > /sys/block/mdN/md/sync_action" to force a parity check
of the whole array. Both should be non-destructive. But it would be
wise to check the Thecus documentation before doing anything further.]
Regards,
Roger
More information about the ome-users
mailing list