Ceph OSD Recovery After Power Failure: SAN Switch Was Dead the Whole Time
A power outage knocked my Ceph cluster from 15 healthy OSDs down to 4. The recovery took days of debugging — heartbeat cascades, a ceph.conf misconfiguration, and a dead SAN switch hiding behind NO-CARRIER flags on every node.