Posts Tagged ‘logrotate’
Stability
I’ve come back today from a week’s holiday to find that – to my amazement – the sleep test machine has successfully suspended and resumed a full 48 times without any problems at all. This is incredible considering that this is the machine that can hardly go a night or two without getting into some sort of sticky hang-type situation. Ten nights, three sleeps a night and more at the weekends. Am I just lucky or was this due to something being different? OK, this is what was different:
- I wasn’t here. I normally use the machine remotely for shell access during the day. I can’t see a remote user login session being a cause of suspend problems, particularly when there’s no session on the go during the night which is when the machine does its sleep thing anyway (I have sleep disabled during working hours)?
- The client component wasn’t running. This follows the problem we spotted a week or two ago whereby something else would grab the client component’s port before the client component got to it. (Understandably, as Simon pointed out.) I had started the machine up without the client component as I looked into this problem and had forgotten to start the component before going on holiday. So the machine has been running in ultra-stable mode with no profile changes and no RPM changes. This seems to suit the power management stuff down to the ground. I wonder why…?
I think I’ll leave it another week or so with client running this time, and we’ll see if the stability continues.
The run of good luck meant that I don’t currently have a chance of trying out Simon’s diagnostic suggestions in his comment on my previous entry, but no doubt I shall get to try them out soon enough.
In other news, I’ve solved a couple of problems that were plaguing me before the holiday. Both solutions were really stupid and probably show how much I was needing the holiday:
- I’d been trying to get a new multipath SAN partition up on one of the web servers. When I tried to make a filesystem on the new partition I’d get “this partition is busy” errors. The solution: I was using the partition’s
sdentry in/devwhen I should have been using its entry with a big long name in/dev/mpath. It was the system’s own multipath code which was keeping the partition “busy”. The new partition is now happily mounted and filling up with data. - The sleep test machines had been configured to mail me the sleep log files every week when logrotate ran. They did this, but they mailed me stuff which was a month out of date. Who wants that…? Turns out that this is the rather odd default behaviour of logrotate: it mails you the logs it’s about to delete – the ones that fall off the end of the weekly conveyor belt – rather than the most recent logs. Adding the logrotate keyword
mailfirstto the logrotate recipe has hopefully cured this.