LCFG Profile Security Project

April 18, 2018

Having completed the work to add support for GSSAPI auth to the client for fetching profiles I’ve now moved on to the LCFG installer. Currently the installer attempts to fetch the LCFG profile for the machine just prior to the (I)nstall, (D)ebug, (S)hell, (P)atchup, (R)eboot prompt. That fetching is done by calling the client component install method which in turn calls rdxprof in one shot mode. Having previously ported the client component to the Perl LCFG::Component framework I had hoped this would “just work” but it turned out that a number of bootstrapping issues were only being avoided previously due to many hardwired paths in the shell ngeneric code. The Perl framework takes a different approach and prefers to use the LCFG sysinfo resources wherever possible, this improves platform independence and maintainability but presents a bootstrapping problem at the first stage of the install when we have not yet downloaded any profile and thus have no sysinfo resources… I wasn’t keen on performing major surgery on the Perl component framework so I decided that the simplest solution to this problem was to get the installer to call rdxprof directly. With this change the installer worked again but still required support for Kerberos authentication.

Adding support for Kerberos authentication has been done in a fairly simple way. I’ve added support for two new install kernel command line options: lcfg.kauth and lcfg.realm. When the lcfg.kauth option is specified the user is prompted to enter their principal name and the kinit program is run to do the authentication. The user may specify the full principal name, if the realm is not specified then either the lcfg.realm option or the upper-cased domain name is used (e.g. @LCFG.ORG). If the authentication fails then the user is prompted to re-enter the principal name (which defaults to the previously entered string) and password. Once the Kerberos authentication has succeeded the credentials will be automatically used by rdxprof when required for fetching the LCFG profile.


Easy GSSAPI authentication

March 29, 2018

We have many services which are protected with GSSAPI authentication. When accessing these services in some automated fashion from a script (e.g. for a cronjob) it is typically necessary to use a keytab file and do a “kinit” or equivalent. Often we use the k5start tool to do that, either as one-off or running in the background as a daemon to manage a credentials cache. Alternatively, in various Perl scripts we do something similar using the Authen::Krb5 module. Kenny MacDonald in IS recently pointed me at a fairly new feature of the GSSAPI libraries which means that most of the time this is no longer necessary. Instead it is just a case of setting two environment variables – KRB5_CLIENT_KTNAME for the path to the keytab file and KRB5CCNAME for the credentials cache. The GSSAPI library will then do the work of maintaining the credentials cache. This works nicely with the Perl LWP framework, for example:

$ENV{KRB5_CLIENT_KTNAME} = '/etc/foo.keytab';
$ENV{KRB5CCNAME} = '/var/tmp/foo.ccache';

my $ua = LWP::UserAgent->new();
my $req = HTTP::Request->new( GET => "https://www.example.org/auth_site/" );
my $response = $ua->request($req);

Note that for this to succeed the LWP::Authen::Negotiate and LWP::Protocol::https modules must be installed. The principal used is apparently the first encountered in the keytab file, there does not appear to be anyway to control that selection which means keytab containing multiple principals may be problematic.


LCFG Profile Security Project

March 28, 2018

This week I have been working on providing a way to configure the LCFG client profile fetcher via client component resources. In particular some sites will need to be able to specify SSL options (e.g. ca_path, verify_hostname) and also be able to set parameters for the authentication modules (e.g. gssapi might need the keytab file path to be specified). By default profile fetching will work for most sites without any additional configuration, furthermore as this is most easily expressed in terms of list and hash data structures I’ve decided to only support setting these parameters via a configuration file. Although it is currently configured entirely through the command line, rdxprof daemon supports loading configuration from a YAML file. I’ve altered the SetOptions method so that when it encounters a fetch entry in the configuration data hash it will pass this through to the LCFG::Client::ProfileFetcher instance via a configure method which knows how to handle the various options.

The current LCFG client component is written in bash which makes generating a config file in YAML more tricky than I would like. As we have a longstanding plan to rewrite all the core LCFG components into Perl this seemed like a good opportunity to get on with that job. I’ve previously been putting off this particular rewrite since the component is rather old and very complex. It manages the starting, stopping and signalling of the rdxprof daemon and as such it has a lot of code for handling PID files and checking for the liveness of processes. Given that we no longer support platforms such as SL6 and older this situation can be massively improved by switching to systemd for the management of rdxprof. I’ve introduced /usr/lib/systemd/system/rdxprof.service and /etc/sysconfig/rdxprof files which can be used by the component to control the daemon. To properly verify that the rdxprof daemon has successfully started the component creates a null callback and waits for it to be processed. I’ve moved the implementation of that into the LCFG::Client module itself so that the details are nicely hidden behind an API.

This is all implemented in perl-LCFG-Client version 4.3.4 and lcfg-client version 4.0.3. To make it easier to test I’ve added a dice/options/lcfg-client.h header. If the DICE_OPTIONS_LCFG_CLIENT_GSSAPI macro is defined then a new keytab will be created and the LCFG client will use it for authentication. The LCFG server is not yet quite ready for me to enable the use of gssapi but hopefully will be in the next couple of days.

Enabling gssapi for an LCFG client will be done something like this:


!kerberos.keys mADD(lcfg)
kerberos.keytab_lcfg /etc/lcfg/client.keytab
kerberos.keytabuid_lcfg root
kerberos.keytabgid_lcfg lcfg

!client.url mSET(https://lcfg1.inf.ed.ac.uk/profiles https://lcfg2.inf.ed.ac.uk/profiles)

!client.fetch_auth mSET(gssapi)
!client.fetch_params_gssapi mSET(keytab)
!client.fetch_param_gssapi_keytab mSET(<%kerberos.keytab_lcfg%>)


LCFG Profile Security Project

March 21, 2018

After improving support for Apache authentication in the LCFG server I have moved onto the client this week. The bulk of the work has been focused on the creation of a new LCFG::Client::Fetcher module which encapsulates all the details associated with fetching XML profiles from various sources. As well as improving the authentication support I am taking the chance to overhaul a chunk of code which has not seen much love in either of the v3 or v4 projects. One particular issue is that currently the handling of the list of profile sources is spread around the client libraries, this means that even a small change can involve locating and altering many separate small pieces of code. This general work also includes adding support for IPv6, enhancing SSL security as well as making the code much more maintainable.

One big change in approach I’ve made is that the lists of local file and remote web server sources are now handled in a unified way where previously they were dealt with completely separately. The new Fetcher module has a single list of source objects (either LCFG::Client::Fetch::Source::File or LCFG::Client::Fetch::Source::Remote) which come from the value of the client.url resource. One advantage here is that it is now trivial to add an entirely new type of source (e.g. rsync or ldap) anything with an LWP::Protocol module is a possibility. When configured to use both local files and remote sources the client has always preferred local files where possible, this behaviour is retained by using a priority system with file sources being guaranteed to have a higher default priority than any remote source.

The other part of recent development work is the addition of support for different authentication mechanisms. This is supported via modules in the LCFG::Client::Fetch::Auth namespace, currently we have modules for basic (username/password) and gssapi authentication. As with the new source modules this approach means it is easy to support alternative mechanisms, including site-specific needs which might not be appropriate for merging into the upstream code base. Before making a request the Fetcher will call the relevant authentication module to initialise the environment. I am also working on supporting multiple mechanisms so that if one fails the next will be tried until one succeeds.

Most of the code for the client is now in place and I am working on documentation for the various new modules. Once that is done I need to consider how the necessary authentication information can make it from LCFG resources into the rdxprof application via the LCFG client component. Although I would rather not make such a big change it might be that I finally need to bite the bullet and rewrite the client component from bash into Perl.


LCFG Profile Security Project

March 13, 2018

I have recently begun work on the Review Security of LCFG Profile Access project. So far I have mostly been considering the various aspects of the project with the aim being to produce a list of ideas which can be discussed at some future Development Meeting.

The first aspect of the project I have looked at in more depth is the LCFG server which has support for generating Apache .htaccess files. These can be used to limit access to each individual LCFG profile when fetched over http/https. We have traditionally supported both http and https protocols and relied on IP addresses to limit access but would like to move over to https-only along with using GSSAPI authentication, the LCFG client would then use a keytab to get the necessary credentials. To help with this change I have introduced a new schema (4) for the profile component and made some modifications to the LCFG server code which makes it easier to use the Apache mod_auth_gssapi module. In particular there is new auth_tmpl_$ resource which allows the selection of a different template (e.g. the apache_gssapi.tt template which is provided in the package) which more closely meets local requirements. There are also auth_vars_$ and auth_val_$_$ resources which can be used to specify any additional information that is required. For example:

!profile.version_profile mSET(4) /* not yet the default */
!profile.auth          mADD(ssl)
!profile.auth_tmpl_ssl mSET(apache_gssapi.tt)
!profile.acl_ssl 
   mSET(host/<%profile.node%>.<%profile.domain%>@<%kerberos.realm%>)
!profile.acl_ssl       mADD(@admin)
!profile.auth_vars_ssl mADD(groupfile)
!profile.auth_val_ssl_groupfile mSET(/etc/httpd/conf.d/lcfgadmins.group)

which results in the the LCFG server generating the following .htaccess file:

AuthType GSSAPI
AuthName "lcfg@foo.inf.ed.ac.uk"
GssapiBasicAuth Off
GssapiBasicAuthMech krb5
GssapiSSLonly On
GssapiCredStore "keytab:/etc/httpd.keytab"
AuthGroupFile "/etc/httpd/conf.d/lcfgadmins.group"
<RequireAny>
  Require user "host/foo.inf.ed.ac.uk@INF.ED.AC.UK"
  Require group "admin"
</RequireAny>

The profile.acl_ssl resource holds a list of users and groups (which have an ‘@’ prefix). In a real deployment it might make more sense to use an lcfg/ principal rather host/. The groupfile support is provided by the mod_authz_groupfile module which needs to be loaded.

I have tested this with curl and it works as required. The LCFG client doesn’t currently have support for doing a kinit (or launching something like k5start in the background) prior to fetching the profile so it isn’t yet possible to actively use this authentication method.


Remote Desktop Project

February 28, 2018

This week I’ve been preparing the new staff XRDP service for user testing. It now has a quovadis SSL certificate and I’ve been attempting to resolve an issue with some clients presenting a warning dialogue about not trusting the certificate. According to this bug report it is necessary to include the whole trust chain in the certificate file. I’ve tried appending the contents of the .chain file without success, it’s not clear if I am missing a part of the chain, I’ll continue investigating but if we can’t easily resolve the issue we could just document what users should expect to see.

As Chris had access to a Windows machine he has managed to generate a .bmp image file for the login screen logo which actually displays correctly. I have no idea why the various Linux applications generated bad images but I’m not going to worry too much. This gives us a much more official-looking Informatics login screen which should reassure users. The image has been packaged up in an xrdp-logo-inf RPM.

I’ve also been investigating rate-limiting new connections using iptables. The standard dice iptables configuration is rather complicated so I need to speak to George about the best way to go about this.

To ensure the xrdp service only gets started once the machine is ready to handle connections I’ve modified the systemd config so that it waits for the LCFG stable target to be reached.

I’ve noticed that all the xrdp logs are being sent to the serial console. Even with just a single user that’s flooding our console logs so I’d like to get that stopped. It’s already going to local file and syslog so no more logging is really required. SEE don’t see the same problem so I wonder if it’s related to our Informatics syslog configuration.

The user documentation is now close to being complete, we even have some information on how to access the XRDP service from Android devices.


Remote Desktop Project

February 21, 2018

This week I’ve been working on the configuration for an XRDP server for Informatics staff. This will be publicised as a prototype service, the plan being to hold off replacing the NX service until Semester 2 is completed at the end of May, that avoids the potential for any disruption to teaching. The prototype service will be installed on some spare hardware which has 2 x 2.6GHz CPU, 36GB RAM and 146GB disk space, that’s not huge but should be sufficient for multiple users to be logged in simultaneously. As the staff service is likely to only ever be based on a single server I’ve decided to simplify the config by dropping the haproxy frontend, that will now only be used on the multi-host general service. To protect from DoS attacks iptables will be used to do rate-limiting. If I can work out how to get the xrdp software to log the IP address for failed logins I will also investigate using fail2ban to add firewall rules. Most of the user documentation on computing.help is now ready, I just need to add some instructions and screenshots for the Remmina client on Linux.


User management improvements

November 23, 2017

Management of local users and groups (i.e. those in /etc/passwd and /etc/group) is done using the LCFG auth component. One feature that has always been lacking is the ability to create a home directory where necessary and populate it from a skeleton directory (typically this is /etc/skel). The result of this feature being missing is that it is necessary to add a whole bunch of additional file component resources to create the home directory and that still doesn’t provide support for a skeleton directory.

Recently I needed something along those lines so I’ve taken the chance to add a couple of new resources – create_home_$ and skel_dir_$. When the create_home resource is set to true for a user the home directory will be created by the component and the permissions set appropriately. By default the directory will be populated from /etc/skel but it could be anything. This means it is now possible to setup a machine with a set of identically initialised local users.

For example:

auth.pw_name_cephadmin           cephadmin
auth.pw_uid_cephadmin            755
auth.pw_gid_cephadmin            755
auth.pw_gecos_cephadmin          Ceph Admin User
auth.pw_dir_cephadmin            /var/lib/cephadmin
auth.pw_shell_cephadmin          /bin/bash
auth.create_home_cephadmin       yes /* Ensure home directory exists */

auth.gr_name_cephadmin           cephadmin
auth.gr_gid_cephadmin            755

LCFG Core: resource types

November 21, 2017

The recent round of LCFG client testing using real LCFG profiles from both Informatics and the wider community has shown that the code is now in very good shape and we’re close to being able to deploy to a larger group of machines. One issue that this testing has uncovered is related to how the type of a resource is specified in a schema. A type in the LCFG world really just controls what regular expression is used to validate the resource value. Various type annotations can be used (e.g. %integer, %boolean or %string) to limit the permitted values, if there is no annotation it is assumed to be a tag list and this has clearly caught out a few component authors. For example:

@foo %integer
foo

@bar %boolean
bar

@baz
baz

@quux sub1_$ sub2_$
quux
sub1_$
sub2_$

Both of the last two examples (baz and quux) are tag lists, the first just does not have any associated sub-resources.

The compiler should not allow anything but valid tag names (which match /^[a-zA-Z0-9_]+$/) in a tag list resource but due to some inadequacies it currently permits pretty much anything. The new core code is a lot stricter and thus the v4 client will refuse to accept a profile if it contains invalid tag lists. Bugs have been filed against a few components (bug#1016 and bug#1017). It’s very satisfying to see the new code helping us improve the quality of our configurations.


yum cache and disk space

November 15, 2017

At a recent LCFG Deployers meeting we discussed a problem with yum not fully cleaning the cache directory even when the yum clean all command is used. This turns out to be related to how the cache directory path is defined in /etc/yum.conf as /var/cache/yum/$basearch/$releasever. As the release version changes with each minor platform release (e.g. 7.3, 7.4) the old directories can become abandoned. At first this might seem like a trivial problem but these cache directories can be huge, we have seen instances where gigabytes of disk space have been used and cannot be simply reclaimed. To help fix this problem I’ve added a new purgecache method to the LCFG yum component. This takes a sledgehammer approach of just deleting everything in the /var/cache/yum/ directory. This can be run manually whenever required or called regularly using something like cron. In Informatics it is now configured to run weekly on a Sunday like this:

!cron.objects             mADD(yum_purge)
cron.object_yum_purge     yum
cron.method_yum_purge     purgecache
cron.run_yum_purge        AUTOMINS AUTOHOUR * * sun