Replacement of failed SAN controller

We have received a replacement for the SAN controller that failed following the power cut of the 11th of November. We had to abort our plan to replace it yesterday (Monday 18th) after we discovered there’s not enough slack in the cables to manoeuvre the failed controller out and the replacement controller in. Unfortunately this now means we will have to turn off the SAN box to do the replacement.

This SAN box (ifevo4 as we call it) currently serves about 50TB of data, all of which will be unavailable while it is powered down to replace the controller. Given the disruption this may cause, we plan to do the work starting at 10:30am on Sunday 24th November.

To minimise the disruption, for the servers that have both local disk storage and SAN mounted storage, we will unmount the SAN storage and leave the local disk data available. This means that, apart from a couple of short breaks (a couple of minutes each), most home directories will remain available.

For the rest of the SAN mounted data (mostly group space) it will be unavailable for the duration of the controller swap, which should take between 30mins and an hour.

To check if your home directory is on local disk, run the “homedir” command. If it says your home directory is on either a /vicepa, /vicepb or /vicepc partition, then you will be fine (apart from the brief interruptions). eg in my case:

neilb> homedir
neilb (Neil Brown) : nessie/vicepc : /afs/inf.ed.ac.uk/user/n/neilb : free
162.2G (used 64%)

So I’m on server “nessie” and partition “/vicepc”, so should be fine. We realise that some users are still on SAN mounted space, and between now and Sunday, we’ll be moving who we can to local disk.

All other networked file space will be unavailable during the replacement, eg everything under /group or /afs/inf.ed.ac.uk/group.

If there are any major problems with the planned date and time, please get in touch as soon as possible, but the longer we run on a single controller the bigger the risk of further unplanned failures.

Neil
Services Unit

Posted in Uncategorized | Leave a comment

Forum power failure on 11th November

We suffered a power failure in the Informatics Forum on Monday 11th November 2013, starting at about 11am and ending at about 1pm. We still have no information from Estates and Buildings as to why this failure occured.

Many users were surprised to find that whilst power to their desktops and the network wasn’t interrupted, many of the School’s servers shutdown shortly after 11am.

Emergency power for servers based in the Forum is provided by a pair of UPS. These are primarily intended to allow us to weather short (eg a few mins) power interruptions and to cleanly shutdown servers for longer interruptions. When both UPSes are fully functional, we have around 45 minutes of runtime on battery (given our current power load). Unfortunately, one of the UPSes has been out of action for a number of months, reducing our runtime on battery to 20 minutes.

Emergency power for offices and the network is provided by a single building UPS. This has a runtime of around 3 hours on battery, given our current power load. It is worth noting that the energy overhead of the building UPS is quite high, and consideration is being given to withdrawing it from service. No other University building has this level of cover for offices.

When power was reinstated at 1pm, the majority of services resumed reasonably quickly. However, the hardware failure of a disk controller in one of the storage arrays had a knock on effect for a number of services – eg AFS. Power to some less critical services (eg Hadoop cluster) wasn’t immediately restored, just in case the power had dropped again.

You can read our post mortem here.

Posted in Uncategorized | 1 Comment

New look EASE login page

As people using the University’s EASE web login page will have hopefully noticed, the look of https://www.ease.ed.ac.uk will be changing at 8am on the 5th of November.

The image below shows you what to expect, there may be small differences, but generally it will look like the following:

New look EASE login page

New look EASE login page from November 5th 2013

So rest assured this change is legitimate and intended.

Neil

Posted in Uncategorized | Leave a comment

Matlab Total Academic Headcount Campus Agreement

On September 1st, 2013 the University of Edinburgh introduced a Total Academic Headcount (TAH) site licence for the Mathworks MATLAB® application and the MATLAB® Distributed Computing Server.

This new TAH license will replace all our existing School provided research and teaching licenses.  In brief summary it allows unrestricted concurrent access for all staff and students to MATLAB®, the 48 currently available toolboxes and MDCS.  It also allows standalone remote installations (staff and PGR students only).

At the moment the new license only supports releases up to and including 2013a, 2013b will be available shortly. On DICE desktops we still have 2010b deployed but we will be updating this in due course to 2013a and then to 2013b when that becomes available. The default licenses used on DICE are still the School ones. This will shortly change to being the centrally provided TAH one, however you can already start using the new TAH licence by following the instructions below.

For those that are not aware, MATLAB® is a high-level language and interactive environment for numerical computation, visualisation, and programming. You can use it to analyze data, develop algorithms, and create models and applications. There are open source alternatives to MATLAB® that may also meet your needs, such as R and Octave, please take time to consider these as well.

For detailed information on using MATLAB® visit the MATLAB® TAH resource website:

http://www.mathworks.co.uk/academia/tah-support-program/campus.html

For information on what’s included in the UoE licence, acceptable use, support contacts, activation codes, and installation please visit the UoE Mathworks wiki.

https://www.wiki.ed.ac.uk/display/Mathworks/Home

If you are a member of staff or a PGR student and want to install MATLAB® on your own personal machine (to run using either the concurrent network license or standalone) please see the link above for downloading and activation instructions.

Using the new TAH license on DICE:

For now on DICE machines to use the TAH license you need to set an environment variable before starting Matlab. The following command should work:

MLM_LICENSE_FILE=27009@129.215.97.100 matlab

Within Matlab you can check that the correct license has been applied by choosing the Help->Licensing->Update Current Licenses menu option which should report the license as being the Total Academic Headcount Campus one.

Note that at the moment only the toolboxes installed on DICE under our existing license arrangements will work.

Posted in Uncategorized | Leave a comment

DICE Gridengine cluster retirement

With the upgrade to Sl6.4 we have opted not to upgrade the gridengine based beowulf cluster and this service is now decommissioned. The hardware is now fairly elderly and ECDF/Eddie provides an equivalent service with better resources for high thoughput computing. The recent splintering of the gridengine software with Oracle bringing the standard release back under a closed source license has also made any upgrade path considerably more risky.

The school has maintained a gridengine based beowulf cluster since early 2002, initially as a collection of desktops(dell gx240s) and workstations (Dell WS530s and latterly as a number of rackmounted servers (DELL PE1425s) connected with dedicated ethernet and myrinet switches. We have used gridengine since very early on with the cluster originally being developed using PBS just as MJR were bought by Veridian and PBS was brought back into a closed source license. The supporting filesystem was originally on NFS exported from a single server and has grown into a 4.6Tb GPFS filesystem served off of a cluster of 5 nodes.

The hardware is also used on the Schools hadoop cluster and we’re currently looking into moving this on to newer and more powerful nodes at ECDF at which point the nodes will be disposed off and we’ll make a not insignificant dent in the Schools electricity consumption.

Posted in Uncategorized | Leave a comment

Virtual DICE – please help us test it

We’re delighted to announce Virtual DICE. It’s like a DICE desktop, but running in a virtual machine. It’s currently in a testing phase – see below.

Everybody in the School of Informatics is welcome to use Virtual DICE, but you will need your own computer, which will need (at the last count) 35GB of free disk space, and a few free gigabytes of memory too. It will also need to be able to run virtualisation software – we’ve been using VirtualBox which runs on popular varieties of Linux and Windows and on many Macs.

It can be used whether or not a network is available.

vdice-login

Virtual DICE isn’t quite ready for prime time yet. We think it works, but we’d like it to be tested some more. If you’d like to try it out, please download and install it, then get in touch and tell us about your experiences. Tell us especially about anything which went wrong, and anything which you think it would be helpful to add to the Virtual DICE documentation pages. All feedback will be received gratefully, and providers of the most helpful feedback may also be given rewards in chocolate form.

To install it download a Virtual DICE image file and import it into VirtualBox.
Apart from backing up your files, it should need virtually no maintenance or looking after. If it breaks, delete the virtual machine then install another one from the downloaded Virtual DICE image file. To update it, delete the virtual machine a few times a year and replace it using a newer image file; or run a command each week to update its software selection. To backup your files, copy them to your AFS home directory.

Find out more and get your copy at the Virtual DICE documentation pages.

Posted in News | Leave a comment

Introduction to AFS tutorial

If you are a new member of Informatics and use AFS (e.g. for access to your home directory or group file space) or would just like a refresher then we would like to invite you to attend our annual Introduction to AFS tutorial.

The session will be held on Monday 14th October starting at 10:30am in room IF-4.31, it will last for approximately 1 hour.

It will mainly cover the essentials of file access management using AFS ACLs and groups. There will also be a guide to some useful commands, a summary of the main differences between AFS and other filesystems and a short guide to troubleshooting common problems.

Posted in Uncategorized | Leave a comment

New remote graphical login service

The Computing Team are pleased to announce the launch of a new graphical remote login service. This service provides access to your normal DICE desktop environment from any Linux, MacOSX or Windows based client.

This service uses the NX technology to handle remote X Windows connections, consequently you will need to download the appropriate client software for your system. Full details are provided on the Computing Help pages.

Please note that this is a beta release of the service and as such we expect that some minor issues will be encountered. Please use the Support Form if you experience any problems.

Posted in Uncategorized | Leave a comment

blog.inf.ed.ac.uk updated to WordPress 3.6.1

As announced on sys-announce, to address security flaw in WordPress 3.6 and
earlier, blog.inf.ed.ac.uk has been updated to the current version of WordPress (3.6.1).

There will be some small changes, and you many notice pop-ups highlighting new options.

If you run your own WordPress installation within Informatics, then you must
upgrade your version too, as the exploit allows remote execution of code on our
servers. If we become aware of old WordPress installations on our servers, then
we will have to take steps to protect ourselves from this exploit, and disable
your WordPress installation.

See http://wordpress.org/news/2013/09/wordpress-3-6-1/ for more details on the
update and exploit.

Neil

Posted in Uncategorized | Leave a comment

Temporary wireless network reconfiguration on Forum upper floors

To aid with a research project by ICSA’s Wireless & Mobile Networking (WiMo) Group, we have asked IS to reconfigure the wireless access points for the top three floors of the Forum temporarily.  For the month of September the 802.11a (5GHz) channels have been turned off on these floors’ APs, to minimise interference with the research project’s equipment.

We do not anticipate this causing problems for mobile devices, as 802.11g and 802.11n will remain available on the 2.4GHz channels as before.  However, please do contact Support in the usual way if you have wireless problems in these or other parts of the Forum.  Please contact the WiMo group directly with any questions about the research project.

The remaining floors of the Forum are not affected by this change.

Posted in Uncategorized | Leave a comment