Simon's Musings

January 11, 2009

Cosign authenticated OpenID Identity Provider

Filed under: Uncategorized — sxw @ 11:25 pm
Tags: , , ,

As part of the relocation of my motley collection of ‘not-a-service’ applications, I’ve moved and tidied up the cosign based OpenID identity provider. It’s now available at https://id.not-a-service.inf.ed.ac.uk/.

OpenID is a simple way to use a single digital identity across the entire interent. This experimental Identity Provider allows you to use your Informatics identity as this identity.

The technology behind OpenID means that you can do this without disclosing any information to external sites which might compromise the security of your Informatics account. You still login to our local systems (either when you login to the DICE machine on your desk, or when you go to our web login site) and, needless to say, you should still never disclose your DICE username and password anywhere else. 

Your OpenID is a URL, initially of the form http://id.not-a-service.inf.ed.ac.uk/uun, but by adding a simple bit of HTML, you may use any URL which you control the contents of (so, for example, you could use http://homepages.inf.ed.ac.uk/uun). Any site which displays  (the OpenID logo) in their login field will accept this URL as your identity. You will then be (if necessary), redirected to https://weblogin.inf.ed.ac.uk to enter your username and password, and to our Open ID site to confirm that you’re prepared to divulge your identity.

As the name suggests, this is not a service. It’s not officially supported, and I can make no long term promises regarding its availability. But, please do try it out for ‘throwaway’ web accounts, and let me know if it proves of use.

Technical Details

The service is based around JanRain’s PHP OpenID library, with my enterprise authentication patch. Some crafty use of mod_rewrite, and Apache access control directives force redirection to cosign when authentication is required, whilst still allowing services to access the identity page. The Open ID provider in use is relatively old, and doesn’t support all of the latest bells and whistles.

All of the configuration is performed in the dice/options/openid.h header. The server itself is packaged in the php-openid-server RPM, with MySQL, X509 and Cosign being configured by their corresponding components. The web server is managed using Apacheconf, with an additional configuration file (for the SSL server) being provided through the file component. The templating of the OpenID server is also handled by the file component, with the Informatics style header, and body text being added from LCFG resources.

Unfortunately, the MySQL server database be entirely configured through LCFG, as a password must be shared between the database and the web application. The web application configuration is created by LCFG as /etc/openid/config.php/tmpl, and must be copied into place (/etc/openid/config.php) once the database password has been filled in. Similarly, the database must be created, and the password assigned manually when a new service is configured. Addressing this issue would require a substantial reworking of the MySQL component.

When bringing up a new server, the database must be initialised by running (through om mysql runcommand)

   CREATE DATABASE openid_server;

   GRANT ALL PRIVILEGES ON openid_server.* TO user@host IDENTIFIED BY "password_here";

   FLUSH PRIVILEGES;

In our configuration, both the database, and the server configuration file are part of the backed up set, allowing restores to simply be a matter of copying the configuration into place, and restoring the database.

January 6, 2009

LCFG components as proper objects

Filed under: Uncategorized — sxw @ 4:59 pm
Tags:

Towards the end of last year, at the COs Christmas meal in the sadly-destroyed Khushi’s, Stephen challenged me to produce an python environment for LCFG components. A relatively simple task, you might think, but he threw in the twist that it had to be properly object oriented. This makes it a much trickier prospect, and set me to thinking about what really object-oriented LCFG components might look like.

We’ve got the problem that originally, LCFG was written in sh. Not that great for structured programming, but the original LCFG design still managed to treat what we now know as components as ‘objects’ with methods such as ‘start’ and ‘stop’. The first problem with this is that the execution context wasn’t preserved between each method invocation (components were just scripts which were called with the method as an argument – so every time a method was run the script was invoked from scratch). A simple form of persistence was added – allowing methods to serialise selected object attributes into a file, which would then be loaded, and the attributes reinitialised when the script restarted. This was extended to resources, so that the resource set when ‘stop’ was called would always match that from when the component was initially started. The ‘configure’ method was added to provide a defined mechanism of transitioning between resource sets when the component was running.

All of these additional features were implemented in ‘sh’ – straining the flexibility of the shell to breaking point, and causing numerous restrictions on valid attribute and resource names and values. More recently, we made the jump to implementing some components in perl, but the perl environment is essentially a port of the existing sh one, and adds little in the way of new structure or abstraction.

This leaves a somewhat creaking component environment, which is hamstrung through its implementation language, need for backwards compatibility, and tight coupling with the LCFG client (which handles the interaction between the components and the profiles which the client collects from the LCFG server). If we take a step back from these realities, what would the ideal component framework look like?

Firstly, we need to preserve resource structure when passing sections of the profile to components. Both the current perl and sh frameworks use the collapsed ‘x_y_z’ list form, which makes it impossible to deal with structured resources in meaningful ways. We need to define a new object hierarchy which makes it possible to preserve the structure of the resources in the XML profile right the way through to the component code.

Secondly, we need to deal with the installation, and removal cases. Currently, LCFG has no concept of component installation, or removal. Installation can be detected within the component code as the first time the ‘start’ or ‘configure’ methods get called on a machine, but there’s no generic mechanism for this. There’s currently no way of handling removal – a component never knows when it’s removed from the boot.services list, and so has no way of telling which ‘stop’ invocation is its last. This leads to machines which transition between multiple roles often having a large amount of detritus in their root partitions.

Thirdly, we need to more strictly define persistence. The current persistence definitions are adhoc, partly because sh never gave a clear way of inferring attributes. I believe that our requirements for handling the installation case mean that we should split a component into two different objects. One, Factory-style object should persist from installation to removal. That is, the install method should be that objects constructor, and removal its destructor. All of the objects attributes, and a copy of the resource set, should persist throughout the life of that Factory object, with a ‘configure’ mechanism being available to deal with resource changes throughout its life. For some components, the Factory will be all that is required. For example components which don’t manage daemons, such as pam and sasl, have no meaningful concept of ‘start’ and ‘stop’. Other components will need to have instances which handle the lifecycle of a service. These instances would be created by calling a method of the Factory class, and would have ‘start’ as their constructor, and ‘stop’ as the destructor. In this way, attributes can persist throughout the life of a daemon. In the future, when we support non-singleton components, it would be possible for the Factory to produce multiple concurrent instances. 

Fourthly, we need to build things that can be inherited. As Stephen has noted elsewhere, LCFG desperately needs a way of allowing components to inherit from other components. However, handling the resource set implications of this will require server changes. But, there’s an additional kind of inheritance that we should be interested in. Many of our components do similar tasks, and share large chunks of code. To date, the restrictions of our implementation language (for sh) and framework (for perl) has restricted inheriting useful super classes. Whatever new framework we define should make it trivial to, for example, write a class which handles safely starting, stopping, and notifying a daemon, which can then be inherited by all classes requiring that functionality.

So, that’s my ideal world. Comments?

September 29, 2008

Presenting iTalk : A web interface to our Jabber service

Filed under: Informatics — sxw @ 11:57 am
Tags: , , ,

As this blog has suggested in the past, I’m in the process of moving things off duffus, and at the same time putting them on a more stable, LCFG managed, footing. The latest system to be moved is iTalk, a local installation of the JWChat application, which provides a web based interface to the Informatics Jabber service. As the service name http://italk.not-a-service.inf.ed.ac.uk/ suggests, this is not a production quality service – but it should be usable by anyone who has access to our Jabber service. Please do try it out, and let me know how you get on.

An older post provides some interesting technical details about the configuration of this service.

September 19, 2008

Additional apacheconf features

Filed under: Uncategorized — sxw @ 5:39 pm
Tags: ,

I’ve just added lcfg/options/apacheconf-proxy.h (and the corresponding dice/options/apacheconf-proxy.h) to add the relevant Apache modules to provide HTTP proxying. This joins the existing LCFG level apacheconf feature headers:

  • apacheconf-perl.h – adds support for mod_perl
  • apacheconf-php5.h – adds the PHP5 interpretter
  • apacheconf-python.h – adds support for mod_python
  • apacheconf-rewrite.h – adds mod_rewrite
  • apacheconf-ssl.h – adds SSL support
  • apacheconf-suxec.h – adds suexec support

There are also the following DICE only features, which contain local binaries and configuration

  • apacheconf-cosign.h
  • apacheconf-krb5.h
  • apacheconf-ldapauthz.h

April 4, 2008

Thoughts from the train 2: References and Mutations

Filed under: Uncategorized — sxw @ 10:49 am
Tags: ,

Another thought from the train journey back from UKUUG. The real work here is Stephen’s, I’m just trying to jot down some background so we remember how we go there!

The LCFG compiler currently supports a number of data operations, which were independently developed, and which don’t necessarily nicely fit together. For the purposes of this discussion, these are:

Mutations Operations (like mADD, mREMOVE, mEXTRA, mSET) which take an existing value, and change it in a way that depends upon their parameters.

Early references A reference to the value of another resource. This is evaluated immediately it is encountered, and is set to the current value of the resource.

Late references Also a reference to the value of another resource. However, a late reference is evaluated once compilation is complete (and after all mutations have been computed), and is set to the final value of the resource

Stephen suggested that we handle mutations by holding a list of mutations, rather than the current value, within the parse tree. Then all of the mutations are applied in the final linking step (which is also responsible for reference evaluation). This allows us to optimise our mutation handling, as well as permitting the production of more specific derivation information.

In order to handle early references, we need to store an additional piece of information. When an early reference is encountered, we must store both the resource being referenced, and the current depth of that resource’s mutation list. This means we can mimic the ‘early’ behaviour and still leave reference processing to the linker.

April 3, 2008

Thoughts from the train: LCFG Timestamps

Filed under: Uncategorized — sxw @ 1:30 pm
Tags: ,

Whilst on the way back from the UKUUG conference (see the last post for details), Paul, Stephen, Gihan and I had a long talk about some of the structural issues we’ve encountered in the LCFG server. Some of those thoughts will unfortunately be lost in the mists of memory, however I thought it was worthing jotting down some notes from the very long chat we had about timestamps.

We have an LCFG architecture where a central source repository contains all of the data which the compiler uses to build profiles. Multiple (in our case 2, but theoretically an unlimited number of) machines pull data from the source repository and compile XML profiles from it, which they then serve to clients. Each XML profile must be accompanied by a unique timestamp, so that a client knows whether the XML profile it has just fetched is newer than the one it is currently using. An XML profile is created by compile multiple source files, starting from a single, per-XML-profile file which is also, confusingly, called a profile.

The problem is how this timestamp can be calculated in a robust fashion. The requirements for robustness are:

  • Any change in a profile’s source data should always result in an increase in the timestamp (the timestamp must be increasing)
  • The same XML profile must have an equivalent timestamp when served by multiple machines (it must be possible to compare XML profiles fetched from either server, and determine which is newer)
  • These guarantees must apply regardless of downtime on any of the LCFG servers, and allow for LCFG servers with radically different compilation speeds

In addition, we’ve historically imposed a number of constraints upon the solution

  • There can be no direct communication between the multiple LCFG servers.
  • The LCFG servers cannot ‘talk back’ to the source repository
  • There is no guarantee that all sources come from the same location (there may be multiple SCMs, for example)
  • The LCFG servers cannot maintain state. This is a constraint that flows from our downtime guarantee – if the servers have state, and one goes down, there’s no guarantee that it will have the same state as the other one when it comes back up.

The Current Solution

Currently, we have a solution based on the timestamps of all of the source files that contribute to a particular profile. When a profile is compiled its timestamp is set to that of the most recent source file that contributed to that profile. There are two problems with this system

  • Deleting an entry from a spanning map doesn’t result in a change of profile. If machine D publishes information into a spanning map that machine A subscribes to, and then machine D leaves the spanning map, there will be no change to the timestamp of A’s profile (in fact, it may go back in time). This is because the server does not maintain state. It never knows that D was in A’s profile, and so doesn’t know that the change in D’s configuration should affect A’s timestamp.
  • Timestamp correctness is critical. This interferes with SCM systems, and with using tools such as rsync to copy profile data around. Both SCMs, and rsync set the timestamp of a file to its timestamp on the originating system. If other source files already have timestamps newer than these, then the files you have just copied in will not result in a change to the timestamp of the generated profile, even if they have changed, and the client won’t notice the changes.

CSNs not timestamps

During our discussion it became obvious that thinking of these identifiers as timestamps was counter productive. We aren’t actually interested in the time that the profile was built (or edited or …) at all, we just need an increasing number by which we can order the profiles which we receive. This change sequence number (CSN) can be any object to which an ordering relation can be assigned.

Client Promiscuity

(I’m sure that heading will get me some odd search engine hits)

The requirement that the timestamp must be increasing is critical due to the current server selection algorithm used by the client. This means that which server a client uses will change with every request that client makes – so a situation where timestamps are not in lockstep between all of the servers will cause the client’s state to flap repeatedly as it switches servers.

It would be possible to partially solve this problem by making clients faithful, and having them only switch servers when one goes down. This means that the timestamp problem shifts from being a constant one to one which only occurs occasionally. However, it obviously removes all of the load balancing characteristics of the current system and would have to be carefully analysed before deployment.

This change also isn’t sufficient to permit removal of the ‘timestamp equivalence between servers’ robustness guarantee. When a machine switches from server A to server B it must be able to tell if the profile server B is offering it is newer, older, or the same as that on server A. However, there’s no guarantee that B has built all of the profiles that A built – it may have been running slower than B. So, we still need a way of assigning ordering on these occasions.

Include tree CSNs

We therefore started thinking about ways in which we could create CSNs using purely the data given to us from the source repository. We can’t use timestamps, as there’s no guaranteed correlation between the timestamp and a change. We decided that it was acceptable to require that every file within the repository have an increasing revision number associated with it, that that revision number must be increased every time the file is changed, and that that revision number be available to the LCFG compiler.

We then started thinking about mechanisms for composing these to produce CSNs. The issue here is that we have to be able to deal with deletion – a CSN which works by (for example) adding all of the revision numbers in the tree will fail in the face of a file being deleted – in this example the CSN would actually shoot backwards at that point.

Paul came up with the idea of modelling this as an inclusion relationship, where the revisions closer to the top of the tree are given more weight than those at the bottom. Given that removing a node from the tree requires modifying (and thus incrementing the revision of) the node above it, giving higher nodes more weight ensures that this deletion always results in a changed CSN. Hopefully an example will make this clearer.

Example tree The image on the left shows an inclusion tree for the profile for machine A. A includes the headers a and b which in turn include c, d and e. The numbers outside the circle are the revision number for each of these files.

If we say that weight of each level is 10 times that of the level below it, we could define a CSN for this file that looks like: 156 (the top level has a single node with a revision of 1, the second level has 2+3 =5, and the bottom level 1+2+3 =6). If we were to remove node b, then we would do so by changing A (so it now has a revision of 2). Our new CSN is then 226 – which is obviously larger, despite a section of the tree being removed.

This scheme unfortunately falls down when our summed revisions at each level become larger than the weighting factor we’re applying. However, Gihan asked why we need to treat this as a number at all. Instead, why not represent it as 1.5.6 (using the . as the level seperator). It’s trivial to define an ordering, and we have the ability to grow as large as we like at any given level.

This scheme pretty convincingly solves the first problem – we are no longer reliant on timestamps, and we have a way of producing a unique CSN from the source data. However …

Spanning Maps

The second part of our dilemma rears its ugly head.

In addition to incorporating data gleaned from files it includes, a profile may also contain data produced from spanning maps. Each of these spanning map contributions comes from a machine, and so may be versioned in the same way as that machine – for example if machine B has a CSN of 2.5.7, it’s spanning map contribution may be included as shown in the diagram below.

Inclusion tree with spanning mapsBut, note that we no longer have a revisioned entry for the node which contains these contributions. This is the crux of the problem.

Without a revision number on this node, we can’t deal with D disappearing. If we maintain the number locally, then we can’t keep our two servers in lock step.

The presence of spanning maps breaks what looks like an elegant CSN maintainance scheme. In fact, we came to the conclusion on the train that it is the very nature of the way that spanning maps work that means we can’t easily time stamp them. In the including relationship entries are pulled from the top down (that is, file a includes files c and d) In order to remove file d, we have to modify file a, and that modification will always contribute to our new CSN.

The spanning map case is different, in that it may well be an entry in k (for example) that results in D’s inclusion. If D is no longer included due to a change in k, then k is no longer included in the CSN computation, and so things break. It is the direction of this inclusion order, a fundmental part of the power of spanning maps, which makes composing any kind of CSN (be it from timestamps, or using this scheme) impossible.

Possible directions

The only way to work round this is if you have a scheme where the revision of every file from which a profile may potentially be built (including deleted files) is included in the CSN of that profile. One way of acheiving this is to make the source code repository create a unique CSN for every change in the repository. You get this for free if everything in the repository comes from a SCM system like SVN, but as soon as even one file comes from an external source, you need an external mechanism. This external mechanism must be applied at a common point in the process (that is, on the source server, rather than on the compilation machines).

March 15, 2008

New apacheconf and monitoring thoughts

Filed under: Uncategorized — sxw @ 5:05 pm
Tags: , ,

Yesterday, I shipped a new apacheconf component, with some significant changes to its monitoring support.

Apache is a complicated beast, with many different mechanisms for configuring it. Apacheconf doesn’t necessarily handle all of these different options, and sometimes work arounds are necessary. For example, apache supports providing multiple ip:port combinations to a VirtualHost directive. Apacheconf only supports providing one. For this reason, Neil had configured a service with two VirtualHosts, both with the same server name. Unfortunately, apacheconf assumed that all of the server names would be unique on a given hosts, and so builds its Nagios service descriptions (which must be unique) based on these server names. Upshot of this is that we end up with a monitoring configuration that won’t load.

I’ve made two changes to help mitigate this. Firstly, every apacheconf virtualhost now has a
vhostnagiosmonitor directive, which can be set to false to disable monitoring for that virtual host. Secondly, the apacheconf translator now keeps a list of all of the service descriptions it has created, and adds uniquifiers to any duplicates (initially the IP address and, if that isn’t sufficient, a number).

In addition to this, a new lcfg-monitor has shipped containing a number of bug fixes.

In the long run, we need to give lcfg-monitor the ability to take a list of machines and components for which monitoring is disabled – so that, if this happens again, we don’t end up having to rush to fix broken configurations, or components, just to keep monitoring running for everyone else.

January 29, 2008

Integrating cosign with web sites

Filed under: Uncategorized — sxw @ 12:48 pm
Tags: , , ,

I’ve made a couple of changes over the last few days with a view to making it easier to integrate cosign authentication with web applications, and web sites in general. These are trivially available to sites which are built with the LCFG apacheconf and cosign components, and will be available in the next stable release.

Standard Logout Mechanism

Firstly, a standard logout CGI script is generated by the cosign component, as /var/www/cosign-logout/logout.cgi. Sites built with apacheconf can include the cosign-logout configuration fragment in their host defintion to map this to the /logout URI on their site.

Cosign requires a site-local logout mechanism due to the way in which it uses cookies to record user authentication. When a user is authenticated to cosign and accessing your site they have two cookies, one for your site, and one for the central cosign server. If your logout button only redirects to the central cosign logout page, then that site cookie will continue to exist – so users will be able to still access your site for a brief period of time after they have logged out. Needless to say, this tends to confuse people.

The local logout CGI will remove the local cookie, and then redirect them to the central login service. It should be linked (or redirected to) after your web application has performed whatever internal tidyup it requires on logout (for example, it may have its own cookies to remove).

Authorization

For some services, it is desirable to check a user’s entitlements before allowing them access. Until the new account management technology is available, it is only possible to give local users entitlements, so the mechanism below cannot be used on services which allow access by iFriends.

Entitlements are accessible as LDAP groups, so can be checked using LDAP authorization. To enable this for your web server, you need to include dice/options/apacheconf-ldapauthz.h in the server’s profile. Then you should include the ldap-authz configuration fragment in the configuration of each site you wish to protect. The implementation details of this is different between the DICE Apache 1.3 build, and the Fedora Apache 2.2 system, which unfortunately changes the final configuration steps.

Apache 1.3

Individual sections of the site may then be protected by doing

<Location /my/secret/data>

CosignProtected On

AuthType Cosign

Require group my/entitlement/name

</Location>

(my/entitlement/name is the entitlement that you want to restrict access to)

Apache 2.2

<Location /my/secret/data>

CosignProtected On

AuthType Cosign

Require ldap-group cn=my/entitlement/name,ou=Capabilities,dc=inf,dc=ed,dc=ac,dc=uk

</Location>

(again, my/entitlement/name is the name of the entitlement you wish to restrict accces to. Note that you must specify the full DN of the entitlement, rather than just the name)

January 28, 2008

OpenSSH cascading credentials

Filed under: work — sxw @ 11:44 am
Tags: , , ,

I shipped the OpenSSH package with cascading credentials support that we’ve been testing for the last year or so site wide today. It’ll appear in develop releases from tonight, and in the next stable release.

The cascading credential support isn’t enabled with this, however. Enabling cascading credentials requires a configuration file change which LCFG can’t sync with the package update – so the configuration will get changed in a subsequent release cycle (next weeks, if all goes according to plan).

More details on cascading credentials is available from the second part of my SSH talk at last year’s AFS & Kerberos Best Practices Workshop. I need to make a public release of this patch, too. 

Theme: Rubric.