What's Chris been doing?

Successes and failures at inf.ed.ac.uk

Posts Tagged ‘intel

GX745 gdm hangs: it’s the intel driver

leave a comment »

We’ve found out, to some extent anyway, why our Dell GX745s have been freezing sometimes when gdm starts the login screen, but mysteriously only the ones using the DICE develop release; those using the stable release have been unaffected. It has to do with the X video driver: Stephen noticed that lcfg/hw/dell_optiplex_gx745.h still contained:

#ifndef LCFG_RELEASE_DEVELOP
#include <lcfg/options/video_i810.h>
#endif

The machines on the develop release were using the intel driver, and others were using i810. It seems that this intel driver is bad news on 745s, at least with DVI cables anyway. The develop 745s I know about are all on DVI cables because of a previous problem with VGA cables – and they were on the intel driver because of a previous problem with the i810 driver.

This would leave us in a bit of a fix, but a retest of sleep on the GX745 has revealed that

  • with i810 we now need a completely different sleep quirk from before, and
  • so far it seems to work perfectly.

I’ve therefore changed the sleep defaults for the GX745 to make it sleep only if the i810 driver is in use and to use the new quirk.

I’d better also retest at least a 755 and a 7900, and possibly also explore how the intel driver on a 745 reacts to VGA cables.

Written by Chris Cooke

August 10, 2009 at 7:17 pm

Posted in Uncategorized

Tagged with , , ,

intel at last

leave a comment »

Quick recap: automatic sleep is working happily with SL5.3 on 745s and 755s if they use the intel video driver. I want to get the 745 working also if it uses the i810 video driver. (The 755 with i810 doesn’t resume reliably.)

So, this morning’s experiments so far:

  • Revive a 745 from near death and get it up and running as a healthy-looking DICE machine.
  • With intel video driver, sleep it with what I’ve found to be the correct sleep command: /usr/sbin/pm-suspend --quirk-vbemode-restore
  • Yes, it resumes cleanly
  • Login, and I don’t see the “Resume Problem” error. Good, this is as expected. Logout again.
  • Switch to i810
  • sleep with same command
  • It doesn’t resume. Reboot the machine.
  • Try again without quirks: /usr/sbin/pm-suspend
  • This resumes cleanly.
  • Login, and as expected I see the “Resume Problem” error message. Good.

If this behaviour – different quirks for different video drivers for the same model – is representative it leaves me with the irritating problem of using different sleep commands for the same model depending on which video driver it’s using. It’s irritating because it doesn’t fit the current idea of setting the exact suspend command on a per-model basis in the sleep.h defaults header; also, that header is currently included before the header which sets the video driver so the information about which video driver is in use isn’t available to the sleep defaults header. So I’ll have to do what I did with the business of checking the video driver and somehow get the component itself to decide on the fly what command to use in which circumstance – things will have to be reorganised *again* and further complicated. Gah. I’m getting a bit fed up of redesigning the software to get round bugs in other peoples’ code.

But anyway, I can at least now test the suppression of the error message. This can be done with gconf. One way to alter gconf settings is with the command line tool gconftool-2. The gconftool-2 man page mentions the --type or -t option – to specify the type of the data you’re setting a preference key to – but then doesn’t mention it in its list of options. It has some similar looking options though – --list-type, --car-type and --cd-type – but none of them work with -s or --set, the option you use to set a value. And if you use -s without setting a type it tells you “Must specify a type when setting a value”. Luckily --type does turn out to exist, it’s just not listed on the man page. So this is the first non-error-producing command to try to stop the error message you get after you login after what the system thinks is an imperfect suspend and resume:

gconftool-2 -s /apps/gnome-power-manager/notify_hal_error -t bool false

You can check that you’ve changed the value by examining it before and afterwards using -g or --get:

gconftool-2 -g /apps/gnome-power-manager/notify_hal_error

In this case it’ll print out “true” or “false”.

So, after doing this on the test machine, I repeat the suspend (with pm-suspend) and resume. This time it doesn’t resume cleanly.
Blast.
Is this because I’ve changed that gnome setting? Surely not. I’m assuming that sleep and resume on 745/i810/5.3 is just unreliable, it sometimes works and sometimes doesn’t. Maybe I’ll go back later and undo things and try again but for now I’ll have to limit sleep support to the machines using the intel driver.

Later. I switched the same machine back to the intel driver and then left it. When I went back to it an hour or two later it had gone to sleep, but had hung. So it hangs when using previously reliable resume commands with both the i810 and intel drivers. I’d say there’s something wrong with that machine. Right enough when I revived it earlier today it had had filesystem damage; I repaired that with fsck but perhaps that wasn’t enough. I’ve now initiated a complete reinstall, with fresh filesystems, to see if that changes it back to the expected behaviour.

In the meantime I tried a different tack: to find out why we can’t move to the intel driver and try to shift that barrier. My memory was that we had stuck with the i810 driver because that was the only one which worked with our old and creaky version of Webots which is needed for teaching. Stephen confirms this memory. I talked to Graham, Mr. Webots, and it turns out that he now has authorisation to move us up to webots version 6 which doesn’t exhibit any of the bad behaviour of the elderly version we have. Hooray! He’s optimistic that webots v6 will work with the intel driver on 5.3 745s and 755s, but he’ll test it to check. In the meantime, he points out, we can change the 5.3 machines to the intel driver anyway as no 5.3 machines are yet used for teaching, and Stephen adds that webots isn’t going to be needed anyway until at least September. Excellent! So I’ve altered the dell_optiplex_gx745.h and dell_optiplex_755.h headers to exclude develop machines from the inclusion of lcfg/options/video_i810.h. This seems to have the desired effect on a test 5.3 745: /etc/X11/xorg.conf is rebuilt with no mention of “i810″ and one mention of “intel” drivers. Thus my lcfg-sleep test pool has gone up from 3-4 machines to 30-40 at least. Excellent. Perhaps it’s about time I figured out how to monitor their sleep patterns then. For now I’ve changed the sleep.ng_logrotate resource in dice/options/sleep.h to have them mailed to me until I figure out something more satisfactory. 30-40 machines mailing me two log files once a week, shouldn’t be too bad.

A quick inspection of the sleep log on a random test machine revealed that the component still hadn’t started, so I’ve gone round all of the test machines and started it. On most it hadn’t started but started successfully at my command. Some were down or unavailable, half a dozen or so were already running it.

Written by Chris Cooke

May 28, 2009 at 1:46 pm

Posted in Uncategorized

Tagged with , , ,