Docker service auto-restart on bad health

There is a known intermittent bug with docker which breaks some of
its functions, such as downloading images [1].

The details are being investigated, but most likely docker.service
start occasionally fails to create all the subfolders required
in /var/lib/docker. The workaround is a service restart.

With this change, there is a short wait time after which docker
health is checked and if the check fails the service is restarted.
Note the required subfolders are created almost immediately, so
the wait can be short.

Still, pmon tolerance is slightly increased to allow the repair
mechanism a couple retries before stepping in.


Test Plan:

PASS With pmon turned off and a sleep time of 10 secs,
     deleted /var/lib/docker/tmp and restarted docker.
     Then deleted /var/lib/docker/tmp dir during the 'sleep 10',
     observed that an automatic '/bin/systemctl restart
     docker.service' is triggered, docker is restarted and /tmp
     recreated successfully.
PASS With pmon service up and using the proposed time intervals,
     restarted docker service successfully without interference
     between the two mechanisms

PASS Completed the following operations:
     - AIO-SX install/bootstrap/unlock
     - lock/unlock
     - sudo reboot
     with the following results:
     - /var/lib/docker has all sub-directories
     - applications applied
     - docker service running
     - pulled hello-world image
     - no alarms
     - no 'download failed' error messages in daemon.log

Partial-Bug: 1999182

Signed-off-by: Leonardo Fagundes Luz Serrano <>
Change-Id: Ide2d214ea3c7efb3f2a24327c11ae55f90d5a9ce
Leonardo Fagundes Luz Serrano 3 months ago
parent a8357be883
commit e78e42fb38

@ -2,6 +2,7 @@
ExecStart=/usr/sbin/dockerd -H fd:// --bridge=none $DOCKER_OPTS
ExecStartPost=/bin/bash -c 'echo $MAINPID > /var/run/;'
ExecStartPost=/bin/bash -c 'sleep 2 && [ ! -d '/var/lib/docker/tmp' ] && /bin/systemctl restart docker.service || true'
ExecStopPost=/bin/rm -f /var/run/
# pmond monitors docker service

@ -10,7 +10,7 @@ pidfile = /var/run/
style = lsb ; lsb
severity = critical ; minor, major, critical
restarts = 3 ; restarts before error assertion
startuptime = 5 ; seconds to wait after process start
interval = 5 ; number of seconds to wait between restarts
startuptime = 7 ; seconds to wait after process start
interval = 7 ; number of seconds to wait between restarts
debounce = 20 ; number of seconds to wait before degrade clear
subfunction = last-config ; run monitor only after last config is run