Skip to main content

Useful scripts

Rpi under-voltage

This is a script I use for checking if undervoltage was hit on any of my Raspberry Pi 4s; mainly I'm still trying to catch what is causing my USB with OS to switch to read only mode.

I'm getting this in the log:

Dec 27 23:10:21 cube05 kernel: [1225968.218593] xhci_hcd 0000:01:00.0: xHCI host not responding to stop endpoint command.
Dec 27 23:10:21 cube05 kernel: [1225968.218604] xhci_hcd 0000:01:00.0: USBSTS:
Dec 27 23:10:21 cube05 kernel: [1225968.234615] xhci_hcd 0000:01:00.0: Host halt failed, -110
Dec 27 23:10:21 cube05 kernel: [1225968.234620] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead
Dec 27 23:10:21 cube05 kernel: [1225968.242480] xhci_hcd 0000:01:00.0: HC died; cleaning up
Dec 27 23:10:21 cube05 kernel: [1225968.247911] xhci_hcd 0000:01:00.0: xHCI host not responding to stop endpoint command.
Dec 27 23:10:21 cube05 kernel: [1225968.247921] xhci_hcd 0000:01:00.0: USBSTS:
Dec 27 23:10:21 cube05 kernel: [1225968.248063] usb 1-1: USB disconnect, device number 2
Dec 27 23:10:21 cube05 kernel: [1225968.263931] xhci_hcd 0000:01:00.0: Host halt failed, -110
Dec 27 23:10:21 cube05 kernel: [1225968.264496] usb 2-1: USB disconnect, device number 2
Dec 27 23:10:21 cube05 kernel: [1225968.278711] blk_update_request: I/O error, dev sda, sector 3251264 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0

The causes for this can be multiple; from faulty USBs, faulty Raspberries, issues with the driver, but mostly undervoltage, where the power source drops voltage during some heavy use and the Raspberry flips out...

Script

Author: https://gist.github.com/aallan/0b03f5dcc65756dde6045c6e96c26459

#!/bin/bash

#Flag Bits
UNDERVOLTED=0x1
CAPPED=0x2
THROTTLED=0x4
SOFT_TEMPLIMIT=0x8
HAS_UNDERVOLTED=0x10000
HAS_CAPPED=0x20000
HAS_THROTTLED=0x40000
HAS_SOFT_TEMPLIMIT=0x80000


#Text Colors
GREEN=`tput setaf 2`
RED=`tput setaf 1`
NC=`tput sgr0`

#Output Strings
GOOD="${GREEN}NO${NC}"
BAD="${RED}YES${NC}"

#Get Status, extract hex
STATUS=$(vcgencmd get_throttled)
STATUS=${STATUS#*=}

echo -n "Status: "
(($STATUS!=0)) && echo "${RED}${STATUS}${NC}" || echo "${GREEN}${STATUS}${NC}"

echo "Undervolted:"
echo -n "   Now: "
((($STATUS&UNDERVOLTED)!=0)) && echo "${BAD}" || echo "${GOOD}"
echo -n "   Run: "
((($STATUS&HAS_UNDERVOLTED)!=0)) && echo "${BAD}" || echo "${GOOD}"

echo "Throttled:"
echo -n "   Now: "
((($STATUS&THROTTLED)!=0)) && echo "${BAD}" || echo "${GOOD}"
echo -n "   Run: "
((($STATUS&HAS_THROTTLED)!=0)) && echo "${BAD}" || echo "${GOOD}"

echo "Frequency Capped:"
echo -n "   Now: "
((($STATUS&CAPPED)!=0)) && echo "${BAD}" || echo "${GOOD}"
echo -n "   Run: "
((($STATUS&HAS_CAPPED)!=0)) && echo "${BAD}" || echo "${GOOD}"

echo "Softlimit:"
echo -n "   Now: "
((($STATUS&SOFT_TEMPLIMIT)!=0)) && echo "${BAD}" || echo "${GOOD}"
echo -n "   Run: "
((($STATUS&HAS_SOFT_TEMPLIMIT)!=0)) && echo "${BAD}" || echo "${GOOD}"

Usage

I just created simple voltage.sh on my control01 node and used Ansible to distribute it to all.

ansible cube -b -m copy -a "src=/home/ubuntu/voltage.sh dest=/home/ubuntu/voltage.sh"

And then use asnible to run it on all nodes.

ansible cube -b -m shell -a "bash /home/ubuntu/voltage.sh"

Result

control01 | CHANGED | rc=0 >>
Status: 0x0
Undervolted:
   Now: NO
   Run: NO
Throttled:
   Now: NO
   Run: NO
Frequency Capped:
   Now: NO
   Run: NO
Softlimit:
   Now: NO
   Run: NO
cube02 | CHANGED | rc=0 >>
Status: 0x0
Undervolted:
   Now: NO
   Run: NO
Throttled:
   Now: NO
   Run: NO
Frequency Capped:
   Now: NO
   Run: NO
Softlimit:
   Now: NO
   Run: NO
cube01 | CHANGED | rc=0 >>
Status: 0x0
Undervolted:
   Now: NO
   Run: NO
Throttled:
   Now: NO
   Run: NO
Frequency Capped:
   Now: NO
   Run: NO
Softlimit:
   Now: NO
   Run: NO
control02 | CHANGED | rc=0 >>
Status: 0x0
Undervolted:
   Now: NO
   Run: NO
Throttled:
   Now: NO
   Run: NO
Frequency Capped:
   Now: NO
   Run: NO
Softlimit:
   Now: NO
   Run: NO
control03 | CHANGED | rc=0 >>
Status: 0x0
Undervolted:
   Now: NO
   Run: NO
Throttled:
   Now: NO
   Run: NO
Frequency Capped:
   Now: NO
   Run: NO
Softlimit:
   Now: NO
   Run: NO
cube03 | CHANGED | rc=0 >>
Status: 0x0
Undervolted:
   Now: NO
   Run: NO
Throttled:
   Now: NO
   Run: NO
Frequency Capped:
   Now: NO
   Run: NO
Softlimit:
   Now: NO
   Run: NO
cube04 | CHANGED | rc=0 >>
Status: 0x0
Undervolted:
   Now: NO
   Run: NO
Throttled:
   Now: NO
   Run: NO
Frequency Capped:
   Now: NO
   Run: NO
Softlimit:
   Now: NO
   Run: NO
cube05 | CHANGED | rc=0 >>
Status: 0x0
Undervolted:
   Now: NO
   Run: NO
Throttled:
   Now: NO
   Run: NO
Frequency Capped:
   Now: NO
   Run: NO
Softlimit:
   Now: NO
   Run: NO
cube06 | CHANGED | rc=0 >>
Status: 0x0
Undervolted:
   Now: NO
   Run: NO
Throttled:
   Now: NO
   Run: NO
Frequency Capped:
   Now: NO
   Run: NO
Softlimit:
   Now: NO
   Run: NO

As you can see, no undervoltage or limits are hit, but I had node 05 (cube05) faulty in this manner 2 days ago. The next step is to do some read-write tests, and maybe a cpu load test, and have this script check again.

A possible improvement would be to log directly using syslog, since I have all logs collecting to control01 and that still works when the OS switches to read only mode and I can't log in anymore.