Running daily tasks

As we replaced the boot component with systemd in EL7 we lost the ability to schedule daily tasks using the boot.run resources. This has now been replaced with the specialist runner component. I have some plans for how we can make this new component a lot more useful, particularly with machines which we want to sleep whenever idle, but for now it is just a more or less straight replacement for the boot component run method. This has been running for a few weeks on our EL7 machines and happily managing the daily updaterpms task. Documentation is now available on the LCFG wiki with examples of how it can be used. I also took the chance to document the old boot component run method since it looks like that was lacking a basic users guide.

Posted in Uncategorized | Leave a comment

Logging under EL7

In the good old days, prior to EL7 and systemd, the syslog daemon (and, later, the rsyslog daemon) would listen on the unix socket /dev/log for messages sent from daemons. The syslog daemon would then decide on which messages to record on the console, to local text files or to a remote syslog host.

With the introduction of systemd, an additional logging daemon has been added to the mix – journald. Journald provides much of the functionality of syslog – eg listening for messages from daemons – but it also adds the ability to receive :-

  • Structured system log messages via the native Journal API
  • Standard output and standard error of system services
  • Audit records, via the audit subsystem

Journald stores messages in structured, indexed binary journals rather than in text files. The authors argue that this makes it easier to make queries of local log files. Whether that is true or not, one definite advantage of journald is that it can create per-user journal files. Many daemons, eg sshd, mate-session, gnome-keyring-daemon etc,  log user specific information to /dev/log : this was previously unavailable to the user as the syslog file was protected. Per-user journal files allow individual users to read log entries specific to their account.

The syslog daemon has not disappeared from the scene. Journald does not, yet, have the ability to forward messages to remote logging hosts.  Under EL7, journald listens on the /dev/log socket, stores messages in its journals and then passes on the messages to syslog to process. Syslog can then forward messages to remote logging hosts.

For LCFG, we have decided to minimally change the syslog configuration. Log messages will continue to be logged to text files in /var/lcfg/log, but this will be in addition to the journald journals. This gives us some time to become used to querying the journald journals.  It may be that we may decide, in the future, to drop logging to text files.

Posted in Uncategorized | Leave a comment

internet-online.target

Stock EL7 (and SL7) has a target called network-online.target. This is used to delay services until the network interface is configured and online. Whilst this is suitable for a scenario where the network gateway is statically configured or obtained via DHCP, it is not suitable for systems which run router discovery (eg rdisc). On such systems, network-online.target is reached before any network gateway is configured, so services which require access to off-wire resources (eg updaterpms) can fail to start correctly.

To solve this, an extra target called internet-online.target has been added. This target is only reached once network-online.target and any services required to provide full internet access have been started. Services that require off-wire resources to start correctly should depend on this target (using ‘after‘) instead of the network-online.target.

Posted in Uncategorized | Leave a comment

New installer feature

One of the problems we’ve encountered with systemd is the handling of interaction with the user on the console. In particular, we have always had the kerberos component run the kdcregister tool after the final reboot of the install process. This kdcregister tool asks the user to authenticate with their admin principle and then uses that to add the hostclient for the machine into the keytab. Although possible when booting with systemd the problem comes from having lots of components starting simultaneously and all writing to the console, it becomes very difficult to spot that an interactive prompt is waiting for user input! The solution was to move this registration step from after the final reboot to before the end of the initial install phase.

Similarly we want to keep using the same SSH host keys even after a reinstall. Consequently we needed to run our local script to restore the SSH host keys from our wallet repository. This has to happen before the first time the ssh daemon is started or it will generate new keys which are not overwritten by our scripts. Previously this was quite simple as the ssh daemon was managed directly by the LCFG component but it is now managed by systemd. Again, the simplest solution is to move this step to the end of the initial install phase.

One thing that has become clear with the move to systemd on EL7 is that we have lots of this type of Informatics-specific configuration which has to be done during the install process. If we move all that to the LCFG installer then we necessarily have to add lots of local scripts and packages to the installer ISO image (referred to as the installroot). This is not really manageable and requires other sites to carry lots of software which they don’t need. It’s also not very flexible, if we decide to add new scripts to the install process we end up having to rebuild the ISOs and update the PXE installer.

An alternative solution is to use the software installed onto the machine in the first phase (referred to as the installbase. The list of packages for the installbase comes from an LCFG profile and, although there is a default profile, each site typically has their own. This is much more dynamic and flexible, we can add any local packages we like to our local installbase profiles and change them whenever necessary.

To actually use the software recently installed on the machine at this stage requires a chroot call as the new root partition is mounted as /root. Rather than hardcode all this into the existing install component I added a new baseinstall component. This is designed to work just like the main install component but uses software from the installbase. It does various things to sanitise the calling environment (e.g. setting the PATH variable and tweaking the tty settings). Commands are called in a full shell so that it’s possible to use all the features you might need. For example:

!baseinstall.installmethods     mADD(kdcr)
baseinstall.imethod_kdcr        %oneshot% /usr/bin/kdcregister_wrapper -f -a -r <%kerberos.realm%> -s kdc.inf.ed.ac.uk hostclient/$(hostname)

As with the install component, the %oneshot% indicates a script is to be called and otherwise it’s assumed to be an LCFG component which should be passed to om.

Another advantage of this approach is that the code does not need to be modified to deal with an alternative root. Previously many LCFG components have had to have an extra install method which deals with the new config files being stored in /root. With the baseinstall component the standard configure method can be called without modifications.

By default the baseinstall component is included only on EL7 but it has no methods. This should work on SL6 but so far it has only really been tested on EL7.

Posted in Uncategorized | Leave a comment

openafs client changes

As part of the move to EL7 we are trying to use systemd where possible to manage daemons rather than doing so through LCFG components. One service we want to move over to this model is openafs. Sadly the existing openafs component wasn’t really playing along and the changes necessary where likely to be extensive and incompatible with what was required to manage the daemon via upstart on SL6.

We’ve had a longstanding desire to make the management of openafs much simpler by splitting the client and server parts of this component into two separate components. With this in mind I have split out the client functionality into a new openafs_client component. As it shares some templates it’s still part of the lcfg-openafs source package but the files are in a separate sub-package so that we do not need to install it on servers which are not clients.

This new component does not make any attempt to start or stop the daemon when the component is started or stopped. That is now left to the init-system. It will still restart the daemon when appropriate to apply changes.

At the same time I took the chance to make the configuration changes more dynamic by using more of the functions provided by the AFS::Command::FS Perl module. This module helpfully provides a much more complete set of functions compared to the old AFS module. There should now be much less need to reboot to apply changes to a running openafs client configuration.

The new component also tries very hard not to Fail if problems occur. It will now log errors and may not complete the configuration but the component will still be in a started state and ready to recieve and apply further changes which will rectify the sitation. This is a problem we see with many components, we are slowly working through them and restructuring the code to ensure they start at boot time whenever possible.

Posted in Uncategorized | Leave a comment

Service function improvements

In May 2014 version 1.6.0 of lcfg-ngeneric was released which provided a new Service function for LCFG components (see this blog article for more details). The aim of this new feature was to make it possible to call daemon methods (e.g. stop, start, restart) in a platform-independent manner.

Originally the functionality in the Shell and Perl versions was implemented separately. This led to there being some inconsistencies in behaviour (e.g. only the Perl version supported setting timeouts). It also meant that adding new features was more difficult than necessary since everything had to be implemented twice.

Recently (version 1.15.0 of lcfg-ngeneric) we have revisited this code and split out the Perl implementation from the LCFG::Component module into a separate LCFG::Utils::Service module. This new module can be used in an entirely standalone manner from any Perl module or script. The new module has been designed to work in a similar way to the LCFG::Om::Command module. Here are a few examples:

use LCFG::Utils::Service;

my ( $status, $stdout, $stderr ) =
    LCFG::Utils::Service::Run( "crond", "status" );

my $status =
    LCFG::Utils::Service::Run( "sshd", "stop" );

LCFG::Utils::Service::Run( "httpd", "start",
                           { timeout => 20 } );

Alongside this new module has been added a wrapper script – lcfg-service which can be used from any script or the command line to call the daemon methods in a platform-independent way. This script is now used for the Service function in ngeneric. It supports setting a timeout, will exit with the status returned by the daemon method and also helpfully passes through any output to stderr or stdout so they can be captured in the calling script (for example, to redirect into a component log file).

To avoid some problems with timeout handling (see Bug#57277) we have raised the minimum required version of the IPC::Run to 0.91.

The actual API of the Service method in ngeneric is unchanged by the addition of this module and script. However, it’s worth noting that a couple of small changes in behaviour were made. There is now a default timeout of 10 minutes, in nearly all cases this will be more than adequate and should help avoid machines becoming hung. Also, for systemd, when a stop action is being called the --no-block option will be added to the arguments list. This avoids deadlock problems when a machine is being shut down. Without this we see problems when both systemd and an LCFG component request that a service be stopped. This has, in particular, affected the sshd service and the associated openssh LCFG component.

Posted in Uncategorized | Leave a comment

Dynamic image generation for login screens

Having decided to use LightDM as the default display manager for EL7, the next problem was to control the appearance of the greeter or login screen. On SL6 we do this using custom KDM themes distributed by RPM. For EL7 we’ve provided a way of configuring the greeter appearance with LCFG.

The new lcfg-webpic component combines text and image resources with an HTML and CSS template into a single image. This can then be used as the background image of a LightDM greeter screen using the lightdm.greeterbg resource.

The image is made using the very useful PhantomJS scriptable headless web browser. For more details of webpic see the LoginScreens page on the DICE wiki and the lcfg-webpic source and man page.

Posted in lightdm | Leave a comment

grub2 – sanity prevails

Back in April 2014 I wrote a new LCFG component to configure the grub2 bootloader. At the time I blogged about the problems with restricting edit access for menu items. The issue was that once you had a list of “super users” the access to BOTH editing and booting menu items was completely restricted to those users. There was no way to allow normal users to boot a particular item without also giving them the ability to edit the menu items (which we really do not want to do…).

Thankfully it appears that sometime since I last looked the situation has vastly improved and sanity has prevailed. Now the behaviour is that when there are super-users specified the editing and booting of menu items is restricted to those users except where a menu item is marked as unrestricted.

For the LCFG component this is as simple as this:

!grub2.users_lcfg_kernel mSET(unrestricted)

In the case of the standard lcfg kernel item that’s now the default behaviour so normal users will always be able to boot that item.

At the same time I also took the chance to slightly improve how the list of super users is specified in the grub configuration so that it is now applied to all menu items not just those managed by the LCFG component.

Posted in Uncategorized | Leave a comment

Local firefox configuration

For various reasons we need to apply some local configuration to firefox on DICE machines. In particular we set the negotiate_auth options so that we get single-sign-on to our various Cosign protected web services. Recent testing of our SL7 desktop environment revealed that although the necessary configuration files were being generated by the LCFG ffox component the settings were not having any effect. A little investigation revealed this was because the main configuration file (all-lcfg.js) was being created in the top-level /usr/lib64/firefox/ directory when it actually needed to be in /usr/lib64/firefox/defaults/preferences/. This was a bit puzzling as it worked fine on SL6 but then it was noticed that the versions of the component and schema were different. A little bit of further checking revealed that although we had schema 2 we did not have the latest version (1.6.2) so the necessary preferences_path resource, which is used to specify the particular sub-directory, did not exist. Once the schema had been updated on our LCFG servers everything began working correctly and our local configuration now has the desired effect.

Posted in Uncategorized | Leave a comment

jabber GSSAPI SASL problems

A number of Computing Officers in Informatics are now using SL7 for their standard desktop. This is good because it means various bugs are being flushed out of the woodwork. A recent issue we have found is that the pidgin messaging client could not talk to our local jabber server. We have GSSAPI support enabled so that we can have single-sign-on, that all works fine on SL6 but the client would not work at all on SL7. It could talk fine to other jabber servers where simple username/password was used though. The error message we have been seeing is:

SASL error: SASL(-7): invalid parameter supplied: Parameter error in client.c near line 961

There was a lot of head-scratching over this problem, after much investigation by various people we pinned it down to an issue with the cyrus-sasl package in SL7 (2.1.26-17.el7). It seems to be related to Redhat bug #984079 and also bug 3480 in the cyrus bug tracker. The latest version of the package (2.1.26-20) taken from Fedora was built for SL7 and that works perfectly. The differences between the two versions of the package are not huge, it appears that the relevant patch is cyrus-sasl-2.1.26-revert-upstream-080e51c7fa0421eb2f0210d34cf0ac48a228b1e9.patch. I suspect we could just apply this single patch to the SRPM from SL7 but we probably want the other patches anyway so we’ll just run with the rebuilt version from Fedora for now.

Posted in Uncategorized | Leave a comment