Wireless networking in the Forum

Wireless networking spectrum is a scarce resource.  In the 2.4GHz band (“802.11b/g”) there are essentially only three non-overlapping channels available.  Things are not quite as tight in the 5GHz band (“802.11a”), but there is by no means generous provision there either.  To make matters worse, these are not dedicated WiFi bands,  but are shared with bluetooth devices, microwave ovens, radar installations, baby alarms, and so on.  This makes provision of a reliable wireless service in the Forum difficult.

The problem is this: because of all the steel and concrete, we need to have quite a number of access points to cover as many nooks and crannies as we can.  However, there are open spaces too, and as a result many of the cells overlap with several neighbours, above and below as well as on the same floor.  The APs do their best to negotiate good channel allocations, trying to avoid each other and otherwise busy or noisy channels where possible, but don’t alway succeeed.

Any additional wireless activity is therefore likely to have a negative impact on the service for the rest of the users in the Forum. This makes it essential that anyone contemplating operating any kind of wireless equipment in the Forum must discuss this with the support team beforehand.  Please submit a support request in the usual way.  (Note that the School’s policy on self-managed machines requires this.)  This includes both WiFi networking and any other use of the same wireless bands.

Please be aware that many pieces of equipment may have a wireless mode of operation, and in some cases this is even enabled by default.  If there is a corresponding “wired” mode, we would always recommend using that instead, as this is faster as well as being more secure and reliable.  In such cases, the wireless side should be turned off.  (Support can arrange for additional (wired) network ports to be enabled if necessary.)  Examples of these devices have included backup appliances and some multimedia equipment.

However, we have also noticed cases of phones and other similar portable appliances being operated in access-point mode.  If you have this enabled on yours for whatever reason, please remember to turn it off again.  Turn off bluetooth as well, while you’re at it.  You’ll find your battery life is better too!

When we installed the wireless network in the Forum we gave priority to good coverage for meeting rooms.  Other than some issues on the ground floor, caused in part by rogue APs, we believe that the meeting rooms are reasonably well covered.  However, if you have any comments on the wireless service, in the meeting rooms or elsewhere in the Forum, we would be very pleased to hear from you.  We can’t definitely promise to fix it, given the nature of the building, but any information would be useful.  Thank you!

Posted in Uncategorized | Tagged , , | Leave a comment

Backing up users’ crontabs on DICE

I’d like to draw users’ attention to the following, recently posted on one of our CO blogs:

“The Research and Teaching unit recently noted that users’ crontabs (sometimes required, for example, on research servers to start or check on custom services) are not routinely backed up [...] the purpose of this post is simply to draw attention to the above, and ask a few general questions”

For more details, or to comment, please see the original blog entry: cron and on and on?.

Posted in Uncategorized | Leave a comment

Scientific Linux 6.3

The 3rd minor update to ScientificLinux 6 (which is based on RHEL6) is now ready for deployment to the Informatics SL6 DICE office machines. A minor update like this provides us with the opportunity to update important software and fix any bugs which are not security issues (we apply security updates as soon as they are available) in a controlled manner.

In this upgrade the most noticeable change for users is likely to be the replacement of OpenOffice 3.2.1 with LibreOffice 3.4.5.2

To complete this upgrade a delayed reboot will be scheduled for all DICE office desktops. The delay will be 5 days, although the reboots are delayed it would be greatly appreciated if people could manually reboot their machines at their earliest convenience; the delayed reboot would then be cancelled. The machines in the student labs were successfully upgraded to SL6.3 before the beginning of semester 1 so this upgrade only affects office desktop machines. Upgrades for individual servers will be scheduled over the next few weeks and users affected will be contacted as necessary.

SL6.3 was released on 8th August 2012 and since then it has been thoroughly tested in our DICE environment so we are confident that this update will not cause any issues for users.

Details of the package updates are available on the LCFG wiki. For further, in depth information, there are also release notes from ScientificLinux and RHEL.

If you have any questions or problems with the upgrade please contact our User Support team through the support form.

Posted in Uncategorized | Leave a comment

Mac Network Settings

Some users have been noting errors “Another device on the network is using your computer’s IP address” and our network monitoring has detected a number of flip-flops
involving Apple kit trying to use the same IP address. The culprit seems to be Apple’s ‘Internet Sharing’ software which causes a machine to use addresses which have not been assigned to it. The problem seems to be that it hi-jacks the IP-to-MAC translation, so that packets to the intended destination go to it instead.

It looks to be related to the sleep proxy code. There’s quite a long discussion about this. In particular:

“macs running snow leopard will act as bonjour sleep proxy servers when they have internet sharing activated. this is likely the reason you’re seeing sporadic occurrences on your network rather than the entire 10.6 installed base.”

Princeton’s approach to the problem is to prohibit the use of such features on their campus network:

http://www.net.princeton.edu/mac/internet-sharing-x

It is therefore important that you have this internet sharing feature disabled on your Apple kit and we may at some point configure our switches to block this kind of activity. There are other issues with this sharing feature (e.g. turning into a DHCP server) which would definitely impact on other users. In such a case, the offending kit would be required to be moved from the network until the issue was resolved.

Posted in Uncategorized | Leave a comment

Informatics new Blogging Service

The blogs currently available on blog.inf – which has been not-a-service since its inception in 2009 – have moved to a new host, running the latest version of WordPress. This is now a supported service.

The upgrade and transfer process has obviously preserved all blog posts, comments, and uploaded files. However, there are probably a few default settings that blog admins might want to check – changing the tagline for a blog from the imported-default “Just another Informatics not-a-blog-service site” might be a good idea…

It will also be possible to import blog posts from other blog sites using WordPress and other blogging software (RSS feeds, Blogger, etc). These importers are available from your site’s Dashboard => Tools => Import page (but may also need to be enabled from your blog’s Plugins page).

You may also wish to change your default logout page, as the default doesn’t integrate very well with CoSign. You can do this by editing your settings (Dashboard => Settings => HTTP Authentication) and setting the “Logout URI” to “https://blog.inf.ed.ac.uk/logout?%site%” (where “%site%” is literal). This will log you out of WordPress and all other web applications, and take you back to your WordPress site.

 

Posted in Uncategorized | Leave a comment

homepages.inf.ed.ac.uk OS upgrade to SL6.2

The homepages web service – homepages.inf.ed.ac.uk – is overdue an
Operating System upgrade. It is still running the SL5.6 OS, where as
DICE desktops are running at least SL6.2.

The plan is to upgrade the homepages.inf.ed.ac.uk service to SL6.2 on
the morning of Sunday 7th October. The switch over will only cause a
break in service of about 10mins.

Between now and then you can try out your current homepages content at
the URL.

http://homepagessl6.inf.ed.ac.uk/<username>/

ie just append “sl6″ to the name “homepages”. This service is serving
exactly the same content as the current homepages, so any changes you
make in the file space /public/homepages/<username>/web/ will be
reflected at both URLs:

http://homepages.inf.ed.ac.uk/<username>/

http://homepagessl6.inf.ed.ac.uk/<username>/

Note the the service name homepagessl6.inf.ed.ac.uk is only temporary to
give you a chance to test things. Once the upgrade has happened, the
existing name (homepages.inf.ed.ac.uk) will remain the normal name for the
service, just that it will now be serving your content from an SL6 based
server.

Software changes

The upgrade in OS also brings upgrades to various packages (as it did
when desktops were upgraded from SL5 to SL6), probably the most
noteworthy are:

old-version new-version
PHP 5.1.6 5.3.3
Python 2.4.3 2.6.6
Perl 5.8.8 5.10.1

One important change with PHP, is that the default is to no longer
recognise the “short open tag” ie just “<?” to mark the beginning of a
PHP code block. Instead you should use the recommended
“<?php”. If you have code that uses the short tag version, then you
can either update your code, or you can turn on short tags for your
pages by creating an .htaccess file in your web area that contains the
lines:

<IfModule mod_php5.c>
php_value short_open_tag 1
</IfModule>

Problems?

No major issues are expected, but if the timing of the change over is a problem, or you discover a major problem with the SL6 version of homepages, then please let me know via the support form.

Posted in News | Tagged , | 1 Comment

Network security and robustness enhancements

New versions of network switch firmware often bring additional facilities along with bug-fixes.  We have recently been evaluating some security and robustness enhancements, with a view to rolling them out across the Informatics network.  In particular:

  • DHCP protection, which will prevent rogue machines purporting to be DHCP servers on the self-managed subnets.  We occasionally see this, it can be very hard to track down, it is disruptive to everyone else on the subnet, and is almost always some portable piece of kit which has been configured for a home situation and then not reconfigured appropriately for the Informatics network.
  • ARP protection, which will prevent a machine from claiming an IP address which has not been allocated to it.  This usually results in a large drop in throughput for both machines, as packets are mis-directed or dropped.  Again, this is often due to misconfiguration, though we have also seen an increase recently as a result of misfeatures in some of Apple’s protocols.
  • Dynamic IP lockdown, which will prevent a machine from using an IP address which has not been allocated to it.  There is almost never a good reason for this to happen.

None of these should have any effect on a machine which is properly configured and working normally.  We have been testing the mechanisms for several weeks now and they do appear to work as advertised, and we will therefore be rolling them out across the Informatics network as appropriate.

Posted in Uncategorized | Leave a comment

SSH protocol 1 disabled

In order to improve the security of the DICE computing infrastructure we have decided to disable support for the SSH version 1 protocol. This is a very old protocol which is now considered to be obsolete. Due to a number of design flaws it is known to be insecure and is vulnerable to man-in-the-middle attacks.

We have examined our logs and noted that in the last 5 months this protocol has only been used very occasionally. We have identified a small number of users who may be affected by this change and have contacted them directly to offer advice and assistance with reconfiguring or upgrading their SSH clients.

Posted in Uncategorized | Leave a comment

Prometheus – account management in Informatics

This article describes Prometheus, the account management system developed within Informatics (note that the choice of name comes from this Prometheus, and predates the recent film). More accurately, Prometheus could be described as an entity management system, as it manages machine details (and other things), as well as people. This article will only discuss the latter.

Prometheus was written to solve a few shortcomings with our existing systems, the major ones being:

  • The trade-off between centrally managed account databases and decentralised per-service database – how we effectively manaage these.
  • A varied collection of automated scripts, all written in different ways, in different places.
  • The need for sysadmins to do things manually (sysadmins like efficiency and error-reduction and hence don’t like doing things manually if they should be automated).
  • Being unable to effectively manage the entire account lifecycle – deletion/archival as well as creation/modification.
  • Various management issues – spotting/correcting inconsistencies/anomalies.
  • The need for better reporting.

A typical DICE user’s account, rather than being a single item, actually consists of four separate parts, all of which are required to make the account work…

  • Authentication in Informatics is done by Kerberos, so any user needs to have an entry in the Kerberos database, the KDC.
  • When a user logs into a DICE machine, the system needs to know who the user is, what groups they are in and other authorisation information. This comes from LDAP.

In Informatics, users have AFS home directories, so this requires…

  • An entry in the AFS user database.
  • An entry in the AFS volume database, specifying which machine hosts the users volume, where the backup is and so on.

These four separate (but integrated) parts provide a DICE user with a full account, but where does the information required to manage all of this come from? The answer is several different places – information comes from the Informatics school database, from central University feeds and from other local and central sources. The job of Prometheus is to take data from these places, act on it and populate our local databases accordingly, as illustrated in this (greatly simplified) diagram:

Prometheus IO Overview - Simple

So far we’ve discussed only managing the databases that are needed for a full DICE account – Prometheus can also manage anything else which requires user data. For example, services which maintain their own user databases could have those databases partially or wholly managed by Prometheus.

Prometheus makes heavy use of the roles and capabilities system used within DICE. Simply put, a capability is something that a user (or other entity) can do and a role is a collection of these capabilities (and other roles). A role should describe functions that a user performs or a position they hold (e.g. sysadmin, 1st year undergrad, visitor). An example better illustrates how roles and capabilities can be used in reality…

All current members of staff possess the ‘staff’ role – they get this automatically through the school database (and they have it automatically removed when they leave). One of the capabilities this currently provides is ‘login/staffssh/remote’. Prometheus populates an LDAP netgroup of this name with all users who possess this capability. The server staff.ssh.inf.ed.ac.uk has its access controls set (via LCFG) to grant access to all users within this netgroup.

This system is very powerful and allows us to automate many things. One example of this is account creation – once someone becomes a current member of staff, a student or a visitor, the school database allocates them the appropriate role. This in turn gives them the various capabilities required for a full DICE account. Prometheus acts on these capabilities and does all the work necessary to create a full account, without any intervention. Each year’s new student intake requires that we create hundreds of new accounts. This now all happens automatically.

Prometheus is written in Perl using Moose (an object-oriented framework for Perl). It was developed using our git/gerrit not-a-service. Prometheus’s own data is stored in an LDAP directory. Two key concepts are “datastores” and “conduits”. A datastore is a representation of something which Prometheus needs to talk to (some examples of this are: Prometheus’s own entity store, a KDC, IS’s uid server, the school database). A conduit effectively joins datastores together, so, for example, there is a conduit which queries the IS UID server (using its datastore) and puts the uids into the entity store. To populate another service with user data, as mentioned previously, it would need to firstly have a datastore written to add, modify and delete entries and then a conduit written to perform these operations based on data held within Prometheus. Conduits and datastores help to ensure that data flows through our systems automatically and into the correct places.

Some of the future developments we have planned for Prometheus are:

  • Full account lifecycle management – this is in the pipeline and will provide us with a means of implementing grace periods and staged expiry/archival/deletion.
  • Better support for lightweight accounts – not everyone needs a full DICE account, the system is flexible and fine grained enough to manage this.
  • Users should be able to manage their own identities – all DICE users have a user@INF.ED.AC.UK kerberos principal, some have additional principals for, e.g. managing long running jobs. It would be very nice if a user could manage these themselves.
  • Extend and enhance – sysadmins/users could write conduits and datastores for their own services.

Much more information on Prometheus can be found here.

Posted in Uncategorized | Leave a comment

SSH Server Compromise

As many users will have noticed, the Informatics SSH server ‘dunlin’ was unavailable from the morning of Thursday 26th July until the afternoon of Tuesday 31st July. This was because the root account on the system was compromised and an attempt was made to insert a rootkit into the kernel.

The configuration of this system meant that attempts to infiltrate the kernel were unsuccessful and we are confident that no passwords or other sensitive data were acquired by the attacker. The attack did cause the machine to crash, our procedures for handling crashes led to us spotting the system compromise very quickly.

A thorough investigation of the incident was carried out which allowed us to rapidly identify the account which had been used to gain access and get the password changed so that the attack could not continue against other servers. We were also able to identify the method in which privilege escalation was achieved. We have since applied a security fix to all DICE machines and they have been rebooted to ensure the same method cannot be used again.

Posted in Uncategorized | Leave a comment