Posts Tagged ‘TiBS’
It’s change time for my development projects. The TiBS LCFG project is now complete, at least complete enough to be going on with; further developments will be tackled at some point in the Further Improvements to TiBS Component project. The next two weeks of my development time will be spent on the Server Hardware Interaction project then after that I’ll concentrate on the port of LCFG to Fedora 12. (These links are to our devproj project development site, for which you’ll need to be an authenticated School of Informatics user; if you don’t have an Informatics account you can make your own using iFriend.)
Server Hardware Interaction is a rag-bag of things which our servers could do with. First on the list is some sort of monitoring of the ambient temperature, so the machines can shut themselves down cleanly when it gets too hot (we still haven’t got the bugs fully out of our shiny new air conditioning plant). Currently the machines carry on running until each server reaches the pre-set critical point for its motherboard at which point power is cut, which saves the hardware from harm but doesn’t do the data much good. A clean shutdown would be preferable. This’ll give us a safe fall-back procedure; we can then put cleverer stuff in front of that if we like to for example shut down less important servers in a sacrificial manner when the temperature starts to rise too much. Next on the list is RAID monitoring – we need our Nagios monitoring system to alert us when a machine’s RAID disk has gone. Lower down the list is the issue of automatically (or otherwise) keeping the firmware and BIOS versions of our controllers and hardware in general up to date to help avoid problems.
Our servers are mostly Dells and OMSA is designed to deal with a lot of this stuff automatically, but our understanding is that it takes rather more control of the machine than we’re comfortable with; we’ll probably take a look at it and see if we can use bits of it, though.
Finally, it’s been a long time since my last entry here. That time was mostly spent in a long winter holiday in southern India. I’ve started putting photos up on the web from that holiday. You can see them at my photo page. At the time of writing that has pictures of Mysore, of rural Karnataka and Tamil Nadu from the train, and of rangoli or devotional decorations which are drawn on the ground every morning in front of houses.
The lcfg-tibs component now has a twin sibling called lcfg-tibsconf. The latter will replace the former.
The reason is bizarre: when the LCFG tibs component stops, it tries to stop the TiBS software. TiBS software is stopped by calling a shell script called “stoptibs”. One of the things which “stoptibs” does is to “kill” every process on the system called tibs. Including my tibs LCFG component. Top marks for style! Thanks to Stephen Quinney for working out why my component was mysteriously disappearing instead of stopping. So, anyway, I’ve got round the problem for now by renaming lcfg-tibs to lcfg-tibsconf. The stoptibs script doesn’t currently try to kill anything called tibsconf – I’ve checked…
Yesterday we deployed the lcfg-tibs component on our main TiBS backup server. Things seem to have gone smoothly; the software is now installed via RPM packages; the config files are now mostly generated from LCFG resources; and configuration changes are held back until TiBS is idle.
This is phase 1 of the LCFG TiBS component. Phase 2 will automatically generate the list of non-AFS backups from “someone please back me up” resources in the LCFG profiles of DICE machines. Some Nagios monitoring is also desirable! However more development may have to wait a while: other more urgent projects are elbowing their way ahead of this one in the development queue.
The LCFG TiBS component and its accompanying RPMs are not available for distribution outside the School of Informatics because TiBS is proprietary commercial software; but if you’ve also bought this software and you want to use the LCFG component to automate its configuration, let us know, maybe we can share the work.
- The current TiBS installation and how to control it.
- The initial project ideas
- The official project page (devproj project 122).
- A list of which TiBS configuration files are handled by the LCFG component.
One thing that did not go smoothly was my attempt to get the component to stop TiBS when the component stopped. TiBS is stopped with the command stoptibs which can be found on our backup server in /usr/tibs/bin. It’s a shell script. It’s short but I won’t post it here as it’s not freely redistributable. All of my attempts to call it, with backticks or
eval or whatever other wacky way I came across on google, result in the component immediately terminating as soon as stoptibs has run, so the component doesn’t ever officially stop. So far I’m baffled as to what’s wrong here. Is this an elementary perl boob on my part? A bug somewhere in LCFG?
lcfg-tibs 1.1.0 is now out. Not much change: it now makes the symlink /etc/tibs.conf when it configures on a TiBS server, with the symlink pointing to the tibs.conf configuration file.
The configure idea in my last post could be made simpler.
There’s no need for a two stage configure process at all. All we need is for the configure method to be called regularly. (I’m assuming that lcfg locking will automatically prevent simultaneous calls but I’d better check. The documentation says that locking is automatic for “certain methods”.)
When the configure method is called, it first cooks up some new config files from its current resources and diffs those with the existing config files – in other words it finds out whether it needs to do anything. If not, it exits.
If config files do need to be changed, it can then look to see if TiBS is currently busy (tibstat) – if it is, it exits. If not, it stops TiBS, changes the files then starts TiBS again as discussed previously.
The two checks could equally well happen the other way round; whichever is quicker.
We could then call the configure method from cron, say every few minutes between 9am and 9pm.
Much simpler than the previous idea; no need for custom methods or for passing information from one run of the component to another.
lcfg-tibs in its current state will manage most TiBS config
files. It’ll manage all the files that are normally managed by
editing. There are other bits of config it doesn’t handle – those bits
managed by running TiBS commands like
The contents of the config files are managed with LCFG resources. When
a tibs LCFG resource changes, the tibs component will get a configure
call and it will change the relevant config file.
This is the problem with the current lcfg-tibs: it changes the config
file as soon as it sees the resource value change. This is bad because
TiBS config files should be changed only when there isn’t a backup
running – in fact for the config files you change by editing, preferably
when TiBS is completely stopped, too.
There’s another problem with the current lcfg-tibs: it doesn’t start
TiBS when it starts and doesn’t stop TiBS when it stops. The TiBS
software has to be started and stopped manually. This is different
from normal LCFG procedure and not really advisable.
What we can do to fix it
At the moment when lcfg-tibs gets a configure call it makes new versions of the
config files without installing them into the right place. It then
diffs them with the in-service config files, and if there’s a
difference then it replaces the in-service file with the new version.
To ensure that config files are only changed when TiBS isn’t running a
backup we can make this a bit more sophisticated.
We’ll need a two stage process. This is the first stage:
- lcfg-tibs gets a configure call
- it makes new versions of the config files.
- diff these new versions with the in-service config files.
- if there’s a config file change, raise a flag to say so.
- make and save somewhere a list of which config files have to change.
That’s the end of the first stage of the process.
The second stage of the process runs independently of the first. It
can be kicked off either by a human or from a regular cron job
(running say every few minutes from 9am to 9pm), but either way the
component runs with a custom method, call it changeconfig.
When changeconfig runs:
- it looks to see if there’s a flag raised to say that there’s a
config file change (or more than one, the number of changes doesn’t
- if there isn’t it exits.
- There’s a config change pending. Now the component checks to see
whether TiBS is currently running a backup or not. It can do that with
- If there’s a backup running, the component exits from the
changeconfig run. (Another changeconfig will be along in a
few minutes and that one might have better luck.)
- If TiBS is quiescent, we
stoptibs. The TiBS manual says that this
is preferable but not necessary when changing config files, but in
this case we’re doing it to get a lock on TiBS, to prevent a backup from starting while we’re
changing the config files.
- make the config file changes.
- Start TiBS again with
runtibs. If a backup has attempted to start
in the meantime while TiBS has been stopped, it will automagically
start after a
However this raises another question
Giving the component the power to
runtibs begs the
question, why don’t we get the component to
runtibs when the
component starts, and
stoptibs when the component stops?
There doesn’t seem to be a reason not to do this and it does seem to
be the intuitive behaviour.
When and how to
The maintenance of non-AFS backup clients is done with
hostdel commands. A later version of lcfg-tibs will do this for you.
For now the human has to do it.
hostdel have to be run
when TiBS is quiescent but running. They cannot run once TiBS has been
In an environment where the component stops and starts TiBS by itself,
when will it be safe to run
1. Check with
tibstat that there isn’t a backup running.
2. Do it during the day when a backup isn’t likely to start.
3. To ensure safety you could possibly
om client stop to prevent new
tibs resource values from getting to the tibs component; then do your
resource changes; then
om client start.
A better form of locking would be desirable though; we don’t want
to run the risk of someone forgetting to restart the client component
after having done a backup client change. This needs more thought.
lcfg-tibs has gone from “development only” to version 1. Version 1.0.1 in fact. This is to mark its upgrade to supporting TiBS 2402, the version we use on our backup server. It now seems to generate a load of configuration files identical to the hand-maintained ones currently on the backup server. Except for comments, anyway. So, this version is ready for deployment on the backup server. Pleasantly enough I finished this the day before I’d said I would on the project plan. Hooray!
Next: start on version 2, which will have a list of non-AFS backup clients and will compare this with TiBS’ own list and will order TiBS to add or remove backup clients appropriately to match this list.
Now I’m back from my holiday I’d better try to decipher my scribbled notes from the TiBS meeting we had just before I left:
Linux and Solaris group files: everything, including defaults, is managed using “hostadd” (and “hostdel”).
The “vicep” group is managed by hand, so I’ll make the component generate/manage this file.
The list of TiBS configuration files is therefore now finished! **confetti and party squeaks**
- We’ll get the 2402 tarball from Teradactyl.
- Then I’ll adapt the TiBS RPMs and component for it. This will probably be fiddly but not difficult.
- Also I’ll split the TiBS headers somehow to make both stable and development versions – possibly just with a #define.
- Once all that’s done we’ll be able to deploy the component on the server. It will take over the management of hand-edited config files, and the software will be owned by RPMs. Other aspects of TiBS will be managed as at present.
Note 1: one thing I didn’t think to mention is licences – I’d better do a TiBS licence RPM. This should be easy enough and could be installed independently of anything else.
Note 2: when installing the TiBS RPMs we’d better be ultra-careful about preserving the various TiBS state files!
After making the headers support both development and stable use I’ll be able to continue development on the component.
- First up is getting it to maintain the group files – a.k.a. the lists of non-AFS backups – getting the component to take over the “hostadd”/”hostdel” duties. I was wondering about the possibility or advisability of cutting corners with the comparison of TiBS LCFG resources and the current state of TiBS, specifically for specifying non-AFS backups. We reckoned that there was no reasonable alternative to doing a thorough and clean job of looking at TiBS’ group files and figuring out what’s actually in TiBS’ current list of non-AFS backups and comparing them with what the LCFG resources say should be in the list.
- Next, we’ll introduce a spanning map and a client backup component of some kind. The backup component will inherit details of local partitions from the fstab component and feed information into a spanning map which the TiBS server will subscribe to and use to build its group files.
- For safety’s sake the backup component should default to backing up all [local] partitions [with a filesystem!], on the grounds that “opt out” is much safer backup practice than “opt in”.
- The backup component will need to specify a type of backup for each partition – perhaps by choosing a keyword from a definable list of them.
- The partitions and backup type keywords will be published to a spanning map.
- The TiBS server will take information from the spanning map and translate the backup type keywords into TiBS settings.
The code of the tibs component is now 29% of its former size, and it’s a whole lot healthier looking too. Instead of having a separate function to make each of a dozen (and counting) configuration files, we now just reuse the one function a dozen times. Which is what functions are for, after all. I’d been meaning to do this for a while. Feeling pleased with self. The function also now only replaces an existing config file where it needs to – it does a diff of the new and (any) existing config files to see if anything’s changed – and it produces more readable debug output too.
lcfg-tibs now makes a bumper bundle of config files: the latest that can be made on TiBS servers are labels.txt, ThisFull.txt, ThisDaily.txt and the AFS group’s omit file.
This completes the harvest of the low-hanging config file fruit; the remaining config files will have to be tackled with something more complex than LCFG::Template::Substitute. Here’s the updated list of TiBS config files showing what’s been done and what’s still to be tackled.
Judging by a quick shufti round the LCFG server, this component has smashed the previous record for the number of resources used by a single component: from 44 (ffox) to 120 or so (tibs). This could with some difficulty be looked on proudly as a worthy achievement.
lcfg-tibs can now also maintain groups.txt and afsgroups.txt TiBS config files. This feels like slow progress, but at least I am working my way through the list.
I’m still trying to finalise the list of which TiBS configuration files are currently hand-maintained, which are automatically generated, and which we want to have maintained (somehow or other) via LCFG. Here’s the latest effort.
After an afternoon of furious hacking lcfg-tibs now draws a distinction between (TiBS) servers and clients, and for clients it now generates a tibs.ini file, for which the settings can be changed using LCFG resources. By default it creates exactly the same tibs.ini file as the install.sh script does.
I’ve replaced the tibs RPM with the tibs-serverRPM. The former put files in
/opt from where it was intended that the tibs LCFG component would install them into
/usr/tibs. However it was pointed out to me that it would be helpful for the “production” TiBS files installed by the RPM to be “owned” by the RPM, so that for instance one could use
rpm -qf to find out which files belonged to which RPM. Good point, so that’s how it’s now done. The tibs component needed to be rejigged slightly to match: the configuration files which it rebuilds have been excluded from the %files list of the tibs-server RPM so that doesn’t install dummy copies any more so it no longer makes sense for the component to only regenerate files where they already exist. It now just goes ahead and generates each of the required configuration files – the ones I’ve done so far anyway.
Oh, and now that we’re going to need separate RPMs for server and clients, it no longer makes sense for a profile to include tibs.h directly, so some CPP directives now throw an error unless tibs-server.h or tibs-client.h have been included.
Today I made the type-checking stricter for those lcfg-tibs resources which configure settings in
tibs.conf – so that for instance
tibs.tibsatliinitialize will only accept a value of 0 or 1.
But that didn’t take long; most of the day was spent crawling back and forth through the TiBS Manual trying to get to grips with TiBS groups, media pools, classes, and how they all relate to each other, how they’re all maintained and in what ways they all change when AFS comes into the equation; and how our own TiBS groups, media pools, classes (etc.) are configured.
I’ve been working on an LCFG component to control our use of TiBS. This is a quick note of how I’ve been getting on.
The idea is that the component will automate the configuration and running of TiBS as far as possible. One aim is the standard LCFG one of making it possible to, within reason, throw the main backup server off the roof of the building, then substitute a new machine and have LCFG configure it up to the state of its predecessor. Another is to automate the day to day running of TiBS to enable us humans to get on with something more productive than nannying the backup system all day.
As I see it there are three main things that the TiBS LCFG component needs to do:
- control the initial full installation of the software.
- control the software’s day to day configuration files post-installation
- run TiBS commands for the system administrator – some of the commands have ludicrous numbers of options and we can surely make life simpler here, either with further automation or by at least providing a component method which supplies most of the options itself.
The software is installed using a vendor supplied shell script. Although this uses a template configuration file and substitutes a couple of important values into it to form the real configuration file, which sounds promising from an LCFG point of view, it doesn’t follow through on this method very much, preferring instead to just dump the resulting file in place then instruct the system admin to edit it appropriately for the site. I want the whole configuration file to be configurable via LCFG, so what to do here? The first approach was a long delicate detour around the TiBS install script: I made a template for the template, which my component then substituted LCFG resources into to make the template file for the install process, which then produced the real file. For post-installation resource changes the component painstakingly mirrored the install script’s own actions: substituting in the same values and adding the same extra lines which were constructed in the same way.
I implemented this, and it all worked. Except – it then occurred to me that my reasons for doing this rather than taking the other approach I’d considered (using the standard LCFG resource substitution method to transform my own template into my own version of the configuration file, which would completely replace the file generated by the TiBS install process) weren’t really valid after all. I had decided that closely mirroring the install script would produce results which were more likely to be exactly like those produced by the install script – so we wouldn’t have problems caused by odd configuration file entries. I also thought it was best to let the software’s own install script do as much of the work as possible. But once I’d finished my code which accomplished this, and looked at the several screensful, I realised two things: firstly the LCFG template approach is perfectly valid if you just take the time to get the template right in the first place. Secondly almost all the code I’d written and got working could be thrown away and replaced by one call to
LCFG::Template::Substitute and a few debug messages. Much simpler! And importantly, far more maintainable for whoever might take this software over from me in the future.
So that’s the path I eventually took: I threw away my carefully crafted eccentric code and let the LCFG::Template perl module take the strain. This seems to have been a success: I haven’t yet found anything wrong with the resulting config file. It also arguably has another advantage: when making the template config file I took the opportunity to strip out all the helpful comments, so that future sys admins will be less tempted to edit LCFG-controlled config files with a text editor rather than by changing LCFG resources! Don’t worry, the helpful comments are all now in the man/pod file.
Having conquered the main TiBS configuration file I’ve been picking off the other much smaller ones one by one in the same way, and I should soon have them all under LCFG control, although not yet the versions on our actual TiBS server. After that, automatically running the install script should be easy enough – just supply a few options and a bit of input to it then run Configure afterwards to remake the configuration files as above.
We also want to have the component deal correctly with (non-AFS) Linux TiBS clients, as well as with the server. We don’t have any such clients yet – unless you count the main server, which is a client of itself – so I’ll be setting one up then figuring out what to get the component to do to reproduce the setup. Since by this time the component will be able to both automatically run the install process and make configuration files I’m not expecting that to be too hard. Then after that I expect to work on the method(s) which will provide a simplified interface to any TiBS commands which can’t be automated.
And then, I imagine, we’ll stand back and reassess.
Lastly, one thing I’m not clear about at the moment is when to introduce the component to the actual TiBS server and actually start using it. That may become clearer later?
The LCFGification (© Craig) of the TiBS backup software is under way. To help things along I’ve added some comments to Craig’s plan.