McGarrah Technical Blog

USB Drive SMART Updates: Fast-Track to the GRUB Solution

4 min read

New USB drives arrived for my Ceph cluster, and they’re not reporting SMART data. Again. After solving this problem in my October 2025 article, I need to update the configuration with new device IDs and share the lessons learned from running this solution across my entire cluster.

The bottom line: This was absolutely the right decision. SMART monitoring has already caught failing drives before they damaged data, and the performance trade-off is negligible compared to the stability benefits.

The Context: Ceph Storage Reality

My AlteredCarbon cluster runs 69 TiB of Ceph storage across 6 nodes. The Seagate USB drives aren’t just backup storage—they’re critical infrastructure:

SMART data is table stakes for this setup. When a drive starts failing in a Ceph cluster, you need to know immediately, not when it’s too late.

The Problem: New Drives, Same Issue

Five new Seagate USB drives arrived for cluster expansion, and predictably, none report SMART data:

root@edgar:~# smartctl -d sat -a /dev/sdd
Read Device Identity failed: scsi error unsupported field in scsi command
SMART support is: Ambiguous
A mandatory SMART command failed: exiting.

The original solution from October 2025 covered three methods. After months of production use, GRUB boot parameters proved most reliable across all cluster scenarios.

The Fast-Track Solution

Skip the experimentation—go straight to what works:

Updated Device Coverage

The new drives introduced additional device IDs that need quirks:

# Updated comprehensive configuration
cat > /etc/default/grub.d/usb-quirks.cfg << 'EOF'
# USB Storage Quirks for Seagate SMART Monitoring
GRUB_CMDLINE_LINUX="$GRUB_CMDLINE_LINUX usb_storage.quirks=0bc2:2038:,0bc2:2344:,0bc2:ab83:,0bc2:ab9a:,0bc2:ac25:,0bc2:ac2b:,0bc2:ac35:,0bc2:ac41:"
EOF

# Apply and reboot
proxmox-boot-tool refresh
reboot

Device ID Mapping

Device ID Model Series Cluster Nodes Purchase Date
0bc2:2038 Newer models edgar Late 2025
0bc2:2344 Expansion Portable kovacs 2024
0bc2:ab83 Recent drives poe Late 2025
0bc2:ab9a Recent drives edgar Late 2025
0bc2:ac25 Backup Plus Various 2024-2025
0bc2:ac2b BUP Portable poe, kovacs 2024
0bc2:ac35 Newer models edgar Late 2025
0bc2:ac41 One Touch HDD harlan, kovacs 2024

Verification

# Confirm quirks loaded
cat /sys/module/usb_storage/parameters/quirks

# Test SMART access
smartctl -d sat -H /dev/sdX  # Health check
smartctl -d sat -a /dev/sdX  # Full report

Why GRUB Works Best

After testing all three methods from the original article, GRUB boot parameters proved superior:

The other methods (runtime quirks, modprobe config) had reliability issues in production.

Real-World Results

After 4+ months running this configuration across all cluster nodes:

Success Stories

Performance Reality

Yes, disabling UAS reduces USB performance by 10-30%. For Ceph OSDs and backup storage, this trade-off is absolutely worth it:

Before: Fast transfers, blind to drive health
After:  Slightly slower transfers, complete visibility
Result: Zero surprise failures, proactive maintenance

The math is simple: a slightly slower ceph cluster beat data loss every time.

Cluster Deployment

For multi-node deployment, use shared storage:

# Store config on CephFS
cp /etc/default/grub.d/usb-quirks.cfg /mnt/pve/cephfs/configs/

# Deploy to all nodes
for node in harlan kovacs poe edgar tanaka quell; do
    scp /mnt/pve/cephfs/configs/usb-quirks.cfg root@$node:/etc/default/grub.d/
    ssh root@$node "proxmox-boot-tool refresh"
done

# Coordinate reboots (maintain quorum)
# Reboot 2-3 nodes at a time

Troubleshooting

If you’re still getting SMART errors:

# 1. Verify quirks loaded
cat /sys/module/usb_storage/parameters/quirks

# 2. Find your device ID
lsusb | grep -i seagate

# 3. Add missing device ID
echo "0bc2:XXXX:" >> /etc/default/grub.d/usb-quirks.cfg
proxmox-boot-tool refresh

# 4. Try permissive mode
smartctl -d sat -T permissive -a /dev/sdX

Integration with Monitoring

With SMART data available, integrate into your monitoring stack:

# Health check in scripts
if ! smartctl -d sat -H /dev/sdX | grep -q "PASSED"; then
    echo "WARNING: Drive health check failed"
    # Send alert, skip backup, etc.
fi

# Export to Prometheus
smartctl -d sat -a /dev/sdX | grep -E "(Temperature|Reallocated|Power_On_Hours)"

Conclusion

The GRUB method provides reliable USB SMART monitoring across Proxmox clusters. After months of production use, this solution has:

For Ceph environments, SMART monitoring isn’t optional—it’s essential. This solution ensures you catch drive problems before they become data disasters.

References

Categories: technical, hardware, proxmox