What's Chris been doing?

Successes and failures at inf.ed.ac.uk

Posts Tagged ‘time travel

Dr. Who eat your heart out

with one comment

More suspend weirdness – but this looks like another new case. (This project begins to seem like an epidemic of power management bugs.)

This time it’s the test 755 that’s affected. It was hung when I came in this morning: it went to sleep at 00:21 but failed to wake up again at 03:54. It wasn’t totally hung, it seemed to be stuck in a sort of partially up mode where some things were working OK but you couldn’t login to it. I’ve had a stunningly dozy day, my brain seems to have spent most of the day in low power mode, possibly because of the hot weather – so I’ve only just thought to consult the machine’s /var/log/pm/suspend.log (I got the machine going again this morning.)
And it looks interesting. Firstly here’s what you get when you do ls -l on it:

-rw-r--r-- 1 root lcfg 1565 Jun 27  1922 suspend.log

1922? And here are the contents:

Fri May 29 00:21:02 BST 2009: running suspend hooks.
===== Fri May 29 00:21:02 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/00clear =====
===== Fri May 29 00:21:02 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/05led =====
===== Fri May 29 00:21:02 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/15_915resolution =====
===== Fri May 29 00:21:02 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/20video =====
kernel.acpi_video_flags = 0
===== Fri May 29 00:21:02 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/49bluetooth =====
===== Fri May 29 00:21:02 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/50modules =====
===== Fri May 29 00:21:02 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/55battery =====
===== Fri May 29 00:21:02 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/60sysfont =====
===== Fri May 29 00:21:02 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/65alsa =====
===== Fri May 29 00:21:02 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/94cpufreq =====
===== Fri May 29 00:21:02 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/95led =====
===== Fri May 29 00:21:02 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/98lcfgsleep =====
===== Fri May 29 00:21:08 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/99video =====
Fri May 29 00:21:08 BST 2009: done running suspend hooks.
Tue Jun 27 01:26:46 BST 1922: running resume hooks.
===== Tue Jun 27 01:26:46 BST 1922: running hook: /usr/lib/pm-utils/sleep.d/99video =====
===== Tue Jun 27 01:26:46 BST 1922: running hook: /usr/lib/pm-utils/sleep.d/98lcfgsleep =====

Here’s that 98lcfgsleep script in full:

#!/bin/bash

. /usr/lib/pm-utils/functions

case "$1" in
        hibernate|suspend)
                
                /usr/bin/om ntp stop

                /bin/echo "`/bin/date` : Suspend" >> /var/lcfg/log/sleep.activity

                /bin/grep `/bin/date +%d/%m/%y` /var/lcfg/log/sleep | /bin/grep `/bin/date +%H:%M` | /bin/mail -s "Falling asleep" [Chris's mail address]

                sleep 5

                ;;
        thaw|resume)
                /bin/touch /var/tmp/lcfg-sleep-waketime
                
                /usr/bin/om amd restart

                /usr/bin/om ntp start

                /bin/echo "`/bin/date` : Resume" >> /var/lcfg/log/sleep.activity

                /bin/mail -s "Waking up" [Chris's mail address] < /dev/null

                ;;
        *)
                ;;
esac

exit $?

I got the falling asleep mail, and the ntp and sleep.activity log files were both updated at the right time, so all of the suspend-time stuff appears to have happened properly.
The resume-time stuff doesn’t though, and somewhere here seems to be where the machine got stuck for seven hours until I power-cycled it. The “touch” command seems to have happened – but look, here’s a familiar date:

-rw-r--r-- 1 root root 0 Jun 27  1922 /var/tmp/lcfg-sleep-waketime

Here are the last three entries in the amd log:

28/05/09 23:52:32: >> restart
27/06/22 01:26:46: >> restart
29/05/09 10:33:56: >> start

I suppose that means just that it did in fact restart but at some point between 23:52 yesterday and 10:33 today.

There’s no sign of any entry for this resume-time start in the ntp log. The most recent entry in the sleep.activity log is from the suspend script, and the file is dated 00:21 today, when the machine suspended, so that didn’t appear to run. I also received no mail at resume time.

So it looks like the machine went down in 2009 and came up in 1922, which made ntp hang indefinitely on startup.

Written by Chris Cooke

May 29, 2009 at 4:51 pm