Magnetic tape is viewed as a somewhat outdated medium by many people these days but for backing up larges amounts of data over a long period of time (and Informatics has a lot of data to back up), it still offers by far the best value for money. Our old Overland Neo8000 has served us faithfully for the last 5 years but the amount of data backed up in Informatics is ever increasing and the Neo8000 was reaching the limits of its capacity. In addition, the cost of maintaining the library had risen sharply. With this in mind, the decision was taken to procure a new tape library and we have just brought a Spectra T680 library into service (the new library is the 2001ish black obelisk in the centre of the photo, the old library is on the left).

At first sight, there seems little difference between the two libraries. Both can hold approximately 500 tapes and have 6 tape drives installed. But the Spectra T680 drives are LTO6 compared to the Neo’s LTO4 which gives the new library a total native capacity of 1.2 Petabytes compered to 0.4 PB for the old library. In addition, the LTO6 drives are significantly faster which will help greatly in fitting the daily backups into a 24 hour window.

Work involved in porting DICE to SL7

Porting DICE to a new version of the Redhat platform is a non trivial task. For a flavour of the work involved, you can read the final report for the base platform upgrade project. Note that this doesn’t include the additional work in updating all the teaching and software packages that we add to Redhat.

Scientific Linux 6.6 Update

The 6th minor update to ScientificLinux 6 (which is based on RHEL6) is now ready for deployment to the Informatics SL6 DICE office and student lab machines. A minor update like this provides us with the opportunity to update important software and fix any bugs which are not security issues (we apply security updates as soon as they are available) in a controlled manner.

To complete this upgrade a reboot is required. The student lab machines will be rebooted during the night of Thursday 18th June. A delayed reboot will be scheduled for all DICE office desktops. The delay will be 5 days, although the reboots are delayed it would be greatly appreciated if people could manually reboot their machines at their earliest convenience; the delayed reboot would then be cancelled. Upgrades for individual servers will be scheduled over the next few weeks and users affected will be contacted as necessary.

SL6.6 was released on 12th November 2014 and since then it has been thoroughly tested in our DICE environment so we are confident that this update will not cause any issues for users.

Details of the package updates are available on the LCFG wiki. For further, in depth information, there are also release notes from ScientificLinux and RHEL.

HTTPS Everywhere and problems accessing www.inf.ed.ac.uk

We’ve recently been getting an increased number of support tickets about problems accessing Student Services pages. The common thread in most of these tickets is that the person involved is incorrectly trying to do so using an HTTPS URL.

In the case of the Student Services pages, HTTPS is used to authenticate you, and then only allows people with the appropriate authorisation to proceed (as the page authoring system kicks in). If the page is a publicly visible page, then viewing it via HTTP works just fine. For example, for students the first link below should work fine (making sure your browser shows an HTTP URL), but the second should give you an access denied (unless you do have access permission).

It appears that the reason some people are using HTTPS, is because of browser plugins like the EFF HTTPS-everywhere. Unfortunately it ships with a configuration that assumes all HTTP www.inf.ed.ac.uk URLs are also accessible via HTTPS (which they are not).

We have submitted a patch to the EFF to remove this incorrect assumption, but until that is accepted and published, users of this plugin (or similar) should use whatever configuration it comes with, to exempt www.inf.ed.ac.uk from being forcibly redirected to HTTPS.

I hope that helps explain some of the confusion/problems people have been having.

Neil

PS Obviously we’d like to be in a state where HTTPS does work for all Informatics sites, but that transition will be gradual and lengthy.

PPS I’d also be interested to hear about any similar plugins that people are using that do the same thing as the EFF plugin.

AT Decant

The decant from Appleton Tower to Forrest Hill and the Wilkie Building is now well underway.

The first move to Forrest Hill took place over the weekend 23rd/24th May. The Graduate School were mostly all up and running again before lunchtime on Monday 25th May. We spent most of the rest of the week installing machines and new monitors in the Drill Hall. Although the machines were operational by the end of the week, we felt it prudent to postpone opening the Drill Hall to students on Monday 1st June as a result of concerns about the air-conditioning in the Drill Hall and the fairly significant leak that occurred in the building late on Friday afternoon! The students will however be able to use the Drill Hall from Wednesday 3rd June.

The move from levels 6, 7 and 8 to the Wilkie Building over last weekend (30th/31st May) also went smoothly with no major issues except for a few offices being without power.

We will continue to dismantle the remaining labs in Appleton Tower in preparation for the second move to Forrest Hill which is scheduled for the weekend of 13th/14th June.

During week commencing 15th June, contractors will rebuild all the flip-desks in Forrest Hill. After that, we will start to install machines in these desks which will continue to impact on the availability of frontline support staff. We will however continue to man the support desk in the Forum as usual but you may find that we take slightly longer than usual to respond to your requests. Please bear with us!

Account Lifecycle

We have recently implemented automated processing of the final stages of a user account’s lifecycle within our account management system, ‘Prometheus’. [1] This is used to apply expiry (or ‘grace’) and suspension periods to an account, as described in our account closure policy. [2]

All user accounts have ‘roles’ and ‘entitlements’. These are used by our systems to grant access to services, e.g. possession of a particular entitlement would allow a user to log in to a specific machine. The introduction of full lifecycle management means that most roles and entitlements are preserved for a grace period once an account has expired.

At the beginning of the grace period, an automated mail will be sent to the user, indicating the expiry of their account. At the end of the grace period the account will be automatically disabled (and, subject to the suspension period, deleted).

Eligibility for a DICE account is determined by details in the school database, so it is important that this information is correct and up to date. This is particularly pertinent for short-term visitor accounts. If you are sponsoring a visitor, please ensure that Informatics HR are informed of any changes to visit dates.

Virtual Appleton Tower

As we prepare to decant from Appleton Tower to Forrest Hill and Wilkie Building, I thought it might be useful to describe how we’re networking the buildings and why we’ve done it the way we have.

In short, both FH and Wilkie will be operated as virtual floors of Appleton Tower.  This makes it straightforward to move machines from one site to another, as we can just configure the ports in a user’s new location to match what they had before.  Machines and VoIP phones can then just plug in and work, for the most part, so minimising downtime and reconfiguration.  It also simplifies network installation and management, as we have all the surrounding infrastructure services already in place.

Each site connects back to our Appleton Tower core over a pair of 10Gbps fibre links.  This allows for load-sharing, as we are collapsing three sets of uplinks into one for each new site.  It also gives us an additional measure of resilience, as it means that one switch failure, either in AT or FH or Wilkie, will not affect the service for all the others. Network diagrams for Forrest Hill are linked from here, and the Wilkie diagram is here.

The disadvantage of this scheme is that it introduces inter-building dependencies, which we normally try to avoid.  In this case, however, it was felt that this would be outweighed by the simplification of the decant process, given that we expect to be in the buildings for only one academic year.

Planned group space downtime

The disk array ifevo3 has a fault with its flash memory. Though it may be just about possible to replace the faulty memory without disrupting the file service, the recommendation is to shut down the the disk array to do the work. So this is what we plan on doing.

The data on ifevo3 is nearly all group space, plus some of our system/backup data. While ifevo3 is down, the group space listed below will be unavailable.

To do this work without affecting other files and home directories served by the same server, we need to unmount the affected partitions from the servers. This will mean a brief interruption to all files served by the those servers, once before the work starts, and once again after it is complete. These brief breaks should last no longer than 2 minutes, you may not even notice them at all.

We are planning to do the work on Tuesday 26th of May, between 9am and 10am. Please let us know now if this is going to cause you real problems.

The list of group areas that will be unavailable during the work is:

Remember that any web space served from these areas will also be unavailable.

Neil

Explanation of Yesterday directory

As happened today, “Yesterday” isn’t always yesterday! A brief explanation.

All AFS home directories (and some group areas) contain a Yesterday sub-directory. This directory contains a copy of your home directory from “Yesterday”, which can be useful if you accidentally delete a file.

The Yesterday directory is actually a by-product of our backup system. “Yesterday” usually means “from around 9pm”. But if the backups are running behind schedule, as can happen once a month when full backups are taken, then “Yesterday” could mean “from around 1am” or “11am” – it just depends.

This command line will tell you when your Yesterday was created:

/usr/sbin/vos exam fs lsmount ~/Yesterday | cut -f4 -d\' | tr -d \# | grep Creation


To recap, “Yesterday” is usually around 9pm the day before, but for operational reasons it could be as little as “a few seconds ago”.

Neil

SAN firmware update April 2015

Our two Dothill SAN storage boxes in the Informatics Forum are due a firmware update. Due to their built-in redundancy, the upgrades are supposed to be safe to do without affecting their operation. In fact we’ve already done a similar device at KB (that we use for our off-site copy of user’s data) without issue.

However, given the potential disruption that would caused if they do go off-line during the update (between them they have 100TB of storage), we will do the update this weekend, one on Saturday starting at 9am, the other on Sunday at 9am. The bulk of AFS file space should be considered at risk between 9am – 1pm on both days.

Home directories on the Appleton Tower servers (naga, cetus, gorgon and minotaur) are not at risk. Use the “homedir” command to see which server you are on. eg:

neilb> homedir
neilb (Neil Brown) : huldra/vicepa : /afs/inf.ed.ac.uk/user/n/neilb : free 234.2G (used 48%)

In this case my home directory is on “huldra”.

As I say, nothing should go wrong, and the update will be invisible, we are just being cautious.

Neil