Posts Tagged ‘cron’
It’s been a while since I said what was happening with the Sleep project, if I ever did, so here’s a catch-up.
It’s an LCFG component which will run on DICE desktop machines. The idea is to save money by saving electricity. The component will decide whether or not it’s safe to put the machine into a sleep state, and if it is, send the machine to seep.
The component will attempt to make sure that the machine doesn’t miss any cron jobs, so before sleeping the machine it’ll calculate when the machine should wake for the next cron job, and set the machine’s wake time appropriately.
At the moment the sleep component successfully handles the cron aspect of the job, sets the wake time, and puts the machine to sleep. It doesn’t yet judge whether or when putting the machine to sleep might be a good idea.
It has these resources: cronfiles (a list of files where cron jobs might be lurking – for instance
/etc/crontab); crondirs (a list of directories in which to look for more cron files – for instance
/var/spool/cron); suspendcommand; waketimefile (where the system looks for a wake time – this varies according to kernel version).
It’s written in Perl, which I’ve studiously avoided for years but which turns out to be far more manageable than before now that I know about O’Reilly’s Perl Cookbook – so much more clear and helpful for the impatient would-be Perl programmer than the desperately irritating Programming Perl, with its acres of intrusive jokes and its hundreds of irrelevant and outlandish clever edge cases which I really don’t want to know about when I just want a reminder and clear explanation of how to do something, dammit.
Hmm. I’ve just realised that although it handles cron jobs successfully, the component doesn’t know about at jobs at all.
I’ve been looking for a way of comparing the current time with the times of upcoming cron jobs. It looks as if you can do this using modules from the Perl DateTime project. This fabulous collection of modules lets you represent times and dates in pretty much any way you want (including “the little hand is pointing to the twelve and the big hand is pointing to the nine” using DateTime::Format::Baby) and manipulate them in all sorts of ways. You can declare durations, you can do arithmetical operations on dates and durations to get other dates or durations, you can declare time spans and find out whether other dates are in them. One thing which really impressed me is that you can declare sets of dates or timespans then use set operations (union, complement, intersection) between the sets of dates or timespans.
Anyway, it has a handy module called DateTime::Event::Cron which understands times/dates expressed in crontab format, which sounds perfect.
However, after reading the docs I’ve started worrying about details.
Cron isn’t aware of summer time changes – that is, it doesn’t know about them in advance. Instead it reacts when it spots that the time has changed under it.
man cron says:
Local time changes of less than three hours, such as those caused by
the start or end of Daylight Saving Time, are handled specially. This
only applies to jobs that run at a specific time and jobs that are run
with a granularity greater than one hour. Jobs that run more fre-
quently are scheduled normally.
If time has moved forward, those jobs that would have run in the inter-
val that has been skipped will be run immediately. Conversely, if time
has moved backward, care is taken to avoid running jobs twice.
Time changes of more than 3 hours are considered to be corrections to
the clock or timezone, and the new time is used immediately.
DateTime can cope with summertime clock changes when it’s told to use a particular timezone. However, if you try to specify a date that doesn’t exist in that timezone, such as during the lost hour in spring, DateTime will fall over with a fatal error.
Since cron is quite happy to have jobs scheduled during the missing hour, we won’t be able to simply take all the cron times and shove them into DateTime. We’ll have to filter them first. (Unless I’m just not familiar enough with Perl: is a fatal error a big deal? Could my code easily trap it and carry on?)
The DateTime man page recommends getting round summer time problems by using UTC for all the calculations and using the local time zone just for input and presentation. But this wouldn’t help in our case, as the cron times are effectively expressed in the local time zone to start with and may be invalid, in which case the attempt to convert them to UTC will make DateTime fall over with a fatal error.
I’d rather not have a fatal error every year.
One option is simply to look out for the days on which summer time changes. Wikipedia says that Since 1996 European Summer Time has been observed from the last Sunday in March to the last Sunday in October. One could put special checks for those dates in the code. DateTime makes it easy to represent dates like “the last Sunday in March”.
All this assumes specific use of the local time zone. I’m not yet clear on how using the default “floating” time zone might change things. However I have a suspicion that it’ll just behave as cron does, and be taken by surprise by summer time changes – which would be ideal, except that it’d mean that its predictions about how long it is to go until the next cron job is due to run could be totally wrong a couple of times a year.
Today I talked over the problem of sleeping and cron jobs with Alastair and Stephen. Some helpful points were raised, and we came up with a basic behaviour which the system should have.
- Don’t forget to take users’ cron jobs into account. Users might want things run at particular times.
- More power cycles (especially disk spin-ups) will shorten the machine’s life, so don’t do it too often. (So, for instance, if we were waking up periodically to check for Condor jobs to run, we could perhaps wake up say every half dozen hours instead of hourly.)
- If running things at wake-up time: if the user has woken us up, back off and wait a bit, don’t start an avalanche of maintenance jobs going immediately, let the user be able to use the machine’s resources.
- Surely current distributions used on laptops must take some account of missed cron jobs? Check the PM hook scripts on e.g. Ubuntu to see what happens there. (I’ve just had another look on Ubuntu and I see that I missed the ACPI hook scripts: there are a lot of them! Ubuntu starts anacron when it wakes and the machine’s on AC power; but there’s nothing which checks your actual cron.)
- An incidental point: does the SL5 kernel support Condor’s checking for recent USB keyboard/mouse activity?
- The cron component could spot (e.g. at wake-up) what jobs it had missed and decide to rerun some.
- Alternatively we could get the machine to wake up in time to run cron jobs.
- Remember to allow for “at” too. For instance, the autoreboot component uses it!
- We could check cron.hourly/daily/etc. at wake-up time.
- Could parse crontabs to find out what times things will be run.
At this point we decided that the simplest and cleanest thing to do seemed to be:
- Something needs to parse the cron tabs to find out when things are due to run – so we can calculate for instance how soon from now something is due to run. It’s difficult to see how we could do without this. But there must be a Perl module somewhere which does this for you – it can’t be that hard.
- “Is anything due to run in the next X minutes” could then be one of the questions we ask when deciding whether it’s currently a good time to put the machine to sleep.
- We could also then simply wake up in time to run every cron job. Don’t bother to distinguish between “important” and “unimportant” jobs. Just wake up for every one, and if there are so many that the machine never gets to sleep, let’s tidy the cron jobs – rather than trying to tip-toe through the forest of jobs in some complicated selective way.
Previously I had vaguely envisaged some sort of semi-intelligent free-wheeling behaviour whereby a machine might for instance wake up to check Condor, find that it was night time, and decide to run its nightly maintenance jobs then – rather than having them have run at the same particular time every night.
However we don’t need to have this sort of behaviour, so let’s get the sleep system up and running without it for the moment.
Just a brief follow-up to yesterday’s post. After I wrote it I was reminded of some advice of Simon’s: don’t add LCFG bells and whistles unnecessarily. Use existing standard mechanisms where you can.
In this case that would mean requiring our components and software which wanted to run at wake-up or sleepy-time (there must be more technical terms for these, but I like these) to drop a wee script of their own into the
/etc/pm/hooks directory. Forget, for instance, all this stuff about a special component running at wake-up time; it’s an attempt to simplify things but it’ll just complicate them instead.
When the sleep component is up and running, a DICE machine might spend most of its time sleeping, only waking up every few hours to do essential things for a few minutes before going back to sleep again. Then again, it might only go to sleep at night. Or it might be used so heavily (for example by Condor) that it rarely gets a chance to sleep from one week to the next.
I’ve been trying to figure out how best to manage cron jobs on such a machine.
We have to manage these things:
- The scripts in the distro-provided directories
/etc/cron.monthlyneed to be run. At the moment these directories are run by cron at specific times, during which our machine might be asleep.
- Also, those scripts need to be run the right number of times. We don’t want to run them too often.
- LCFG-configured periodic jobs (the current
cron.*resources) need to be run too. For example the boot component is run once per night, and the openldap component runs once per hour.
You can see what a typical DICE desktop runs from cron on a typical day here:
It’d be nice to have clean, simple code managing all of this, so we’d make use odf standard ways of doing things, and we’d have more small, simple components. It’d also be nice to have as little duplication as possible in the LCFG resources – it seems confusing to need both
cron resources and
when-I'm-awake resources for doing periodic tasks). And it’d be nice to have one software system just take care of all of this for us and give us a fairly simple interface so we didn’t need to worry about the interaction between (say) maximum sleep periods before wake, and minimum times between successive runs of a periodic task. It’d be nice to minimise the change required from the current system too – we have hundreds of cron resources in the header files. And it’d be nice to have clear, simple, defined roles for the different components of the system, with clear and obvious interactions between them.
But all this seems horribly contradictory. If you simplify things in one way you seem to complicate them in others. For example if you try to keep the existing cron resources, you either force the cron component to do things grossly different from what it was designed for – it’d need to monitor whether or not cron had done things, or how long the machine had been sleeping, rather than just making a few config files – and since you’d also need to add some extra resources to cover asynchronous operation, for instance whether or not it was acceptable to run a missed-due-to-sleep job an hour after it was missed, the resources get complicated and need changing anyway. The cron component turns into a hideous monster.
Perhaps it could be replaced by one unified “tasks runner” which Just Takes Care Of It for you – you tell it to run things so many times per time period and the component then figures out what to do, and runs as a cron-type daemon, keeping timestamps and goodness knows what. This sounds a bit like the hideous monster cron component above, except possibly cleaner and simpler as it would be designed from scratch. But this seems to chuck “do it the standard way” and “make use of standard software and components” right out of the window.
Maybe we could keep the cron resources, but have a second “asynchronous cron” component inherit their values? It could peek to see when or whether things have been run or not; it could deduce when to run things based on the times given in the cron resources; and so on. Sounds complicated to write. It’d have to understand cron times and be able to judge when or whether to run them. Presumably it’d have to wait until cron had failed to run something then it’d step in afterwards and run it instead. But how long afterwards would be acceptable? The more you think about this the more disgusting it sounds.
Should the component which manages sleep, also be in charge of kicking off periodic jobs? If it knows when it’s going to send the machine to sleep and when it’s going to tell the machine to wake up again, it’ll be able to use that information to figure out when to run periodic tasks. But it sounds as if this would be rather confusing to write.
Should we have a component completely separate from
cron and from the sleep component? Call it
awake for instance.
awake would run when the machine woke up. I envisage a machine with Condor waking up at least every few hours, and a machine without Condor waking up at least once a day, so
awake would be able to run things reliably say once a day. But that’d be dependent on whatever component managing sleep being told to make sure that the machine woke up often enough. Is it OK to leave that sort of coordination task to the sys admin to manage by hand? Sensible defaults could be set anyway.
But wouldn’t that need to run things at some other time as well as whenever the machine woke? Say we have a machine that’s so popular with Condor that it never gets a chance to sleep. If it never sleeps, it’ll never wake, and if it never wakes, it’ll never run our stuff, like the nightly run of the
boot component for instance.
We need some sort of coordination between running things at wake-up time and running things regularly from cron; something which makes sure that things are run often enough but not too often.
I’ve spent the last day or so dreaming up ever more baroque ways in which this shouldn’t be done, and failing to come up with a simple-sounding solution. I’m sure there were several other ideas more ghastly than the ones above.
The last couple do seem to have the most potential, though.
Help? Any ideas or observations?
Oh, and PS: I forgot to say earlier. anacron would be really useful, except its greatest frequency is daily. It has no hourly or twice a day for instance.