I think I’m done. There is now a final report https://wiki.inf.ed.ac.uk/DICE/FinalProjectReport364SchoolEdWeb and a catalogue entry https://wiki.inf.ed.ac.uk/DICE/ServiceCatalogueEntryInfWeb. I need to archive these blog pages and submit it for official sign off. It’s only been 4 years in the making!
As usual the last 10% of the project is taking 90% (of the real time) to complete. This post is a kick up my own backside to get this project finished.
So, even though the things to do are recorded on https://computing.projects.inf.ed.ac.uk/#364, as a note to myself, the things I need to do are:
- Proper disaster recover instructions. Currently web.inf is being mirrored to KB, but needs to be done better and documented. I think I’ll create a new VM at KB, rather than piggyback on the physical DR for www.inf.ed.ac.uk – Now done https://wiki.inf.ed.ac.uk/DICE/WebInfEdAcUKDisasterRecovery
- Document the routine tasks. Which basically means what to do when updates come out, as we no longer need to worry about our local patches, as they’ve been incorporated into the official distribution, or equivalent functionality has been provided.
- I do need to provide some info on our Feature module, and what to check of our local config after upgrades
- Lastly, write a final report.
I’ve added fail2ban to the SL7 version of our auth smtp service. None of the sendmail filters that come with the fail2ban RPM seemed like they’d do the trick for us, so I’ve just overridden the supplied
filter.d/sendmail-auth.conf with a
sendmail-auth.local containing just:
[Definition] failregex = ^%(__prefix_line)s.*AUTH failure.*\[\]( \(may be forged\))?$
Though that isn’t enough to get it to match, as the default log level (9) for sendmail doesn’t log auth failures. So we also have to run at log level 10.
Currently I’m using the local
lcfg-hostsdeny and tcpwrappers template like sshd does, but we should probably look at using iptables instead.
In the few days its been running, 21 IP addresses have been banned.
We’ve been holding off using Cosign/EASE on our EdWeb distro site, until we had a clear solution to the issue of how to become the Drupal admin user (user=1). As soon as we turn on Cosign authentication, then we’ll only ever be able to be user UUN. Even if we created functional accounts (to use IS parlance), then due to our automatic authentication via browsers on DICE, it isn’t very convenient to become anyone other than our UUN. Also, on principle, we don’t have accounts in our authentication service that aren’t associated with an actual individual.
Asking around it sounded like there were two options people used:
- Just don’t sweat it, and use drush from the command line to do all your admin type duties.
- Either give yourself (or a functional account) all the available drupal permissions, so you can do everything.
Not being that fluent with drush, and that our web editors wouldn’t have necessary command line access to the server, 1. didn’t seem the best solution. Options 2 has problems, as the EdWeb developers are a bit wary of this and are not making any guarantees (in fact more likely the opposite) that we wouldn’t be storing up problems if user regularly published with all permissions granted.
So what we’ve decided to do is something a bit like a blend of the two options above.
Ceate a new “admin user” role:
drush role-create 'admin user'
Give the existing EdWeb role “system administrator” a couple of extra
drush role-add-perm "system administrator" "administer permissions" drush role-add-perm "system administrator" "administer users"
The “system administrator” role is already one that should only be given to a few select people who know what they are doing. Generally people will not have this role.
Then as one of those users who is a “system administrator”, via the web GUI, give the new “admin user” role all the other permissions with a couple of mouse clicks, excluding the “bypass …” permissions.
The above steps should only need to be done once to get things set up.
Now if any of the existing “system administrators” need to do something as the admin (user=1) person, then they can temporarily give themselves the “admin user” role, do what we need to do, and then remove the role from themselves once they are done. They should know not to create or modify content with the “admin user” role, but if they did, then hopefully by not having the “bypass …” permissions, things would be OK, but we shouldn’t rely on that.
As I write this, I might look at adding a block that only shows up when you have the “admin user” role, to remind you that you are (nearly) all powerful.
It’s a bit of a faff to have to add and then remove the “admin user” role, but luckily the times you need to be user=1 is fairly rare. And probably just as much as a faff if we were to do thing via functional accounts.
The suggestion to not grant the “bypass …” permissions came from Mairi. In an email/slack post she said:
On the subject of the ‘bypass content access control’ permission, the problem is that because permissions are being bypassed, you wouldn’t necessarily know which hooks are firing & which aren’t. Bypassing permissions will just invisibly allow the user to do anything, with no indication of whether data integrity is OK until something goes wrong. For example, we believe it’s probably OK to publish as user 1, provided that user is configured with relevant group memberships; however, we advise against it because nothing will tell you, when logged in as user 1, that the group hooks are actually firing. Which leads to unforeseen consequences when content is inadvertently created/published without the correct permissions being in place. If you give another user those ‘bypass’ permissions, the same will apply – i.e. that user could be publishing content without the correct hooks firing.
She also pointed me at this article from Stanford https://drupaltraining.stanford.edu/node/13.
Back in July I attended IS’ first EdWeb code sprint. IS are trying out the idea to encourage more collaboration, and find an alternative source of resource to actually get EdWeb code development done.
I found it very useful, though for this first one, everyone was learning, and though I did get my submission back to their git accepted, it was a rather minor change, but real outstanding work that needed done at some point.
There’s a new updated, 1.12, which I’m about to try. It will be interesting to see if my code changes are in there.
Just a brief update on AFS for SL7.
This week all my changes have made it to stable, and there are now 1.2.x versions of the lcfg-openafs (server only) and lcfg-openafs-client components for SL7.
I spent a bit of time teasing the two components apart, so either can be installed in isolation (or together) on a machine. In hindsight it would have been cleaner to leave behind the “openafs” component in the SL6 world, and created a new openafs-server component like Stephen did for the openafs-client component. It would have made the various header files and schema files a cleaner split, but it’s done now.
Craig, I and the Unit have their AFS volumes on a gresley, a new SL7 AFS server. We need to move some more guinea-pigs, but it all seems fine.
Having a look at apacheconf-waklog.h on DICE SL7. This is actually the first SL7 web server stuff I’ve looked at. So first of all I thought I should try getting a minimum SL7 apacheconf.h web server going.
I commandeered circlevm9, a vanilla SL7 server.h VM. And added
After the profile pushed, and I ran updaterpms.
om apacheconf start didn’t “just work”.
22/07/16 12:17:35: apache configuration has been modified
22/07/16 12:17:35: Syntax OK
22/07/16 12:17:35: Failed to reload httpd.service: Unit httpd.service is mas\
22/07/16 12:17:35: ** reload httpd: Fail
systemctl gave me a suggestion:
[circlevm9]root: systemctl status httpd
Loaded: masked (/etc/systemd/system/multi-user.target.wants httpd.service; disabled; vendor preset: enabled)
Active: inactive (dead)
Warning: httpd.service changed on disk. Run 'systemctl daemon-reload' to reload units.
So I tried that:
[circlevm9]root: systemctl daemon-reload
[circlevm9]root: systemctl status httpd
httpd.service – The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; disabled; vendor preset: enabled)
Active: inactive (dead)
Now, after doing an ‘om apacheconf stop’, ‘om apacheconf start’ worked and left httpd process running with /var/www/html/ as the docroot, but with all access denied. I’m presuming a reboot would have had a similar affect.
I then added a simple vhost to open up access to /var/www/html/ so that I could dump stuff in their and convince myself the basics worked.
!apacheconf.vhosts mADD(default) apacheconf.vhostname_default _default_ apacheconf.vhostdocroot_default /var/www/html apacheconf.vhostaccesslog_default /var/lcfg/log/apacheconf.access apacheconf.vhosterrorlog_default /var/lcfg/log/apacheconf.error !apacheconf.vhostverbatim_default mADD(stuff) apacheconf.vhostline_default_stuff <Directory "<%apacheconf.vhostdocroot_default%>">¶\ Options Indexes FollowSymLinks¶\ Require all granted¶\ </Directory>
With that done, I was able to drop files into /var/www/html/ and they would be served. Equally I added some symlinks to other bits of the file system, and they were followed unless file permissions said otherwise. So a symlink to /afs/inf.ed.ac.uk/ showed the contents of publicly accessible stuff, but all other access was denied by ACLs.
So now I know if I add apacheconf-waklog.h and get it working, if they symlinks to AFS show more content, then httpd will have obtained the necessary AFS PTS tokens.
The work on the OpenAFS server for SL7 has been a tricky one, and still not fully resolved.
As a bit of background, in SL6 the single component, openafs, did both AFS client and server configuration for a host. With the switch to SL7, the MPU kindly decided to do the work for the AFS client on SL7 (and systemd), but this meant splitting the client side into a new component
openafs_client, and some corresponding header files.
When starting work on the server side, I did consider (and indeed started) a new
openafs_server component. However, I then decided this was going to lead to a lot of work changing the majority of existing headers and resources to the new named component, so after some discussion with Stephen, decided to make the existing
openafs component “server only” for SL7 onwards.
This too has lead to some problems as both SL6 and SL7 machines included the openafs.h headers, but they have different meanings on the different platforms.
Fortunately most of the openafs.h headers just concern themselves with installing the actual openafs RPMs on the system. So some #ifdef guards for SL6 or 7 now make sure the right bits of -client or -server are further included, depending on what is needed for the machine.
There are still problems, such as the lcfg-openafs-client RPM depends on the lcfg-openafs RPM, the shared template file should be split. Also there is a common /etc/sysconfig/openafs file between client and server, both using template toolkit to maintain its content. This also needs to be split so that the client and server use different files. For the moment, we just state that for SL7 a server cannot also be a client. Which also means I need to make sure the localhome stuff works on SL7 servers.
Having got EdWeb working with EASE, I went back and tried it against our weblogin CoSign service, and that too works. My initial problems were probably the openldap issue that EASE had.
I’ve still not got a clear way to access a site as “admin” once EASE/Cosign is enabled, other than temporarily disabling the UoE LDAP and EASE modules, and reverting to the old way.
I’m looking at an “admin” site that uses basic auth, so I can sign in as “admin”, and though that has a slight success, it then fails with what looks like a failure to find “admin” in the LDAP and then extract an email address, and role information for him. I’m not sure if that should be considered a bug or not.
Also had a look at cron, has Kenny was having problems with the new scheduling on our test 1.10. A mail to UWS Tech suggests that using the wget and cron key is the way to do it, rather than using drush (which I’d been experimenting with).
There have been a couple of updates since we last updated web.inf to 1.7, but each one has had issues which stopped us (and Maths) from upgrading.
But Mairi and co. have been very helpful in trying to solve the problems, and it now looks like we’re just about there, with the future 1.10 release addressing the problems, and including a Drupal security update.
The 1.10 will mean that our only local patch will be for the local search. I’ve created a new version of the patch, as the page template has changed, and though it does work, it’s introduced some extra white space I’m not happy about. However, Kenny’s happy to just apply that as it is for now, rather than delay 1.10 (when it appears).
I’ve also been looking at this again, and have made some progress. webtest.inf is now EASE authenticated, and after some problems with ldap binding to the IS server, it now seems to work for users who already have a local EdWeb account on the site (ie me).
However it isn’t working for people who don’t have a local account, it is supposed to be created on the fly, but I’m getting error messages. Looking at those will be my next action. David McKain has given me some hopefully useful debugging options for LDAP in case LDAP is the issue:
(1) drush vset -y ldap_help_watchdog_detail 1 should turn logging on (2) drush vdel -y ldap_help_watchdog_detail should turn it off again The log messages go into the Drupal watchdog table, aka the 'recent log messages' report. You can view these either within the GUI, or using drush commands. Do 'drush help | grep watchdog' to see the drush commands.