What's Chris been doing?

Successes and failures at inf.ed.ac.uk

Posts Tagged ‘ntp

another hang, and a bios upgrade attempt

leave a comment »

Following George’s comment that it looked to him like amd that was hanging rather than ntp, I’ve switched round the order of the resume hooks so that ntp should start up before amd. That way hopefully amd will start up with a sensible date/time. I’ve not seen the amd component failing to start up before and I haven’t seen the clock going haywire before either, so I’m hoping that the one was the cause of the other. We’ll see if it happens again.

Meanwhile, the very next time the test 755 went to sleep it also didn’t wake up. This time it didn’t seem to be hung in the same way: the last time it happened the client component was managing to report back to the LCFG server, and I could ping the machine, but this time the machine’s LCFG status page reported

Last acknowledgement: 29/05/09 17:50:24

and ping returned “destination unreachable”.

Saving the log files and rebooting, this is what /var/log/pm/suspend.log says this time:

Fri May 29 18:00:03 BST 2009: running suspend hooks.
===== Fri May 29 18:00:03 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/00clear =====
===== Fri May 29 18:00:03 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/05led =====
===== Fri May 29 18:00:03 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/15_915resolution =====
===== Fri May 29 18:00:03 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/20video =====
kernel.acpi_video_flags = 0
===== Fri May 29 18:00:03 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/49bluetooth =====
===== Fri May 29 18:00:03 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/50modules =====
===== Fri May 29 18:00:03 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/55battery =====
===== Fri May 29 18:00:03 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/60sysfont =====
===== Fri May 29 18:00:03 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/65alsa =====
===== Fri May 29 18:00:03 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/94cpufreq =====
===== Fri May 29 18:00:03 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/95led =====
===== Fri May 29 18:00:03 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/98lcfgsleep =====
===== Fri May 29 18:00:09 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/99video =====
Fri May 29 18:00:09 BST 2009: done running suspend hooks.

So all the suspend hooks ran successfully, but then – it hung? When I got to it this morning the machine wasn’t sleeping, as the power button wasn’t flashing, so it’s not simply a case of it not waking up on cue. The power button was solidly on, which is the case when the machine is either hung or running normally. And it wasn’t running normally…

Meanwhile I’ve been trying to upgrade the bios on the test 755. I’ve been trying the method described in the Dell Linux wiki but when I reboot it says that it can’t find the bios.hdr file I’ve loaded in memory.

I think I need to do a “warm boot” rather than the usual (apparently) “cold” one.
The instructions tell you to add “reboot=bios” to the end of your kernel command line, then reboot, then try loading your bios.hdr again then rebooting (when the actual upgrade will then happen). Tried that, get the “can’t find your bios.hdr in memory” message.

Perhaps I should have edited the kernel command line rather than appended to it? Maybe I’ll try that tomorrow.

Written by Chris Cooke

June 1, 2009 at 4:58 pm

Posted in Uncategorized

Tagged with , , , ,

Dr. Who eat your heart out

with one comment

More suspend weirdness – but this looks like another new case. (This project begins to seem like an epidemic of power management bugs.)

This time it’s the test 755 that’s affected. It was hung when I came in this morning: it went to sleep at 00:21 but failed to wake up again at 03:54. It wasn’t totally hung, it seemed to be stuck in a sort of partially up mode where some things were working OK but you couldn’t login to it. I’ve had a stunningly dozy day, my brain seems to have spent most of the day in low power mode, possibly because of the hot weather – so I’ve only just thought to consult the machine’s /var/log/pm/suspend.log (I got the machine going again this morning.)
And it looks interesting. Firstly here’s what you get when you do ls -l on it:

-rw-r--r-- 1 root lcfg 1565 Jun 27  1922 suspend.log

1922? And here are the contents:

Fri May 29 00:21:02 BST 2009: running suspend hooks.
===== Fri May 29 00:21:02 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/00clear =====
===== Fri May 29 00:21:02 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/05led =====
===== Fri May 29 00:21:02 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/15_915resolution =====
===== Fri May 29 00:21:02 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/20video =====
kernel.acpi_video_flags = 0
===== Fri May 29 00:21:02 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/49bluetooth =====
===== Fri May 29 00:21:02 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/50modules =====
===== Fri May 29 00:21:02 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/55battery =====
===== Fri May 29 00:21:02 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/60sysfont =====
===== Fri May 29 00:21:02 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/65alsa =====
===== Fri May 29 00:21:02 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/94cpufreq =====
===== Fri May 29 00:21:02 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/95led =====
===== Fri May 29 00:21:02 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/98lcfgsleep =====
===== Fri May 29 00:21:08 BST 2009: running hook: /usr/lib/pm-utils/sleep.d/99video =====
Fri May 29 00:21:08 BST 2009: done running suspend hooks.
Tue Jun 27 01:26:46 BST 1922: running resume hooks.
===== Tue Jun 27 01:26:46 BST 1922: running hook: /usr/lib/pm-utils/sleep.d/99video =====
===== Tue Jun 27 01:26:46 BST 1922: running hook: /usr/lib/pm-utils/sleep.d/98lcfgsleep =====

Here’s that 98lcfgsleep script in full:

#!/bin/bash

. /usr/lib/pm-utils/functions

case "$1" in
        hibernate|suspend)
                
                /usr/bin/om ntp stop

                /bin/echo "`/bin/date` : Suspend" >> /var/lcfg/log/sleep.activity

                /bin/grep `/bin/date +%d/%m/%y` /var/lcfg/log/sleep | /bin/grep `/bin/date +%H:%M` | /bin/mail -s "Falling asleep" [Chris's mail address]

                sleep 5

                ;;
        thaw|resume)
                /bin/touch /var/tmp/lcfg-sleep-waketime
                
                /usr/bin/om amd restart

                /usr/bin/om ntp start

                /bin/echo "`/bin/date` : Resume" >> /var/lcfg/log/sleep.activity

                /bin/mail -s "Waking up" [Chris's mail address] < /dev/null

                ;;
        *)
                ;;
esac

exit $?

I got the falling asleep mail, and the ntp and sleep.activity log files were both updated at the right time, so all of the suspend-time stuff appears to have happened properly.
The resume-time stuff doesn’t though, and somewhere here seems to be where the machine got stuck for seven hours until I power-cycled it. The “touch” command seems to have happened – but look, here’s a familiar date:

-rw-r--r-- 1 root root 0 Jun 27  1922 /var/tmp/lcfg-sleep-waketime

Here are the last three entries in the amd log:

28/05/09 23:52:32: >> restart
27/06/22 01:26:46: >> restart
29/05/09 10:33:56: >> start

I suppose that means just that it did in fact restart but at some point between 23:52 yesterday and 10:33 today.

There’s no sign of any entry for this resume-time start in the ntp log. The most recent entry in the sleep.activity log is from the suspend script, and the file is dated 00:21 today, when the machine suspended, so that didn’t appear to run. I also received no mail at resume time.

So it looks like the machine went down in 2009 and came up in 1922, which made ntp hang indefinitely on startup.

Written by Chris Cooke

May 29, 2009 at 4:51 pm