# URL shortener, part 1

I’ve written a URL shortener which can be used by anyone in the School of Informatics. For the short story skip to the summary.

The computing staff here in Informatics are required to set aside time for personal development. Constant technical change makes this a necessity. It also stops us from being too bored.
For a while I’ve been promising to spend some development time brushing up web and database programming skills by writing a URL shortener and putting it into service. I’ve now put one together.

What’s a URL shortener? It’s a web service which gives a user a short URL to use in place of a URL which may be long and complicated. For example the address bit.ly/1LfOYvn will take a web browser to blog.inf.ed.ac.uk/chris/url-shortener-part-1, the address of this blog post. goo.gl, tinyurl.com and t.co are other well known URL shorteners.

How does it work? A little like this:

Web browser (to short.url web server):
Hello short.url. Can I have the web page at short.url/tufty please?

short.url web server (to itself):
Let’s see. tufty… is that one of my shortened URLs? If it is, it’ll be in my database.
short.url web server rummages in its database and finds a box labelled tufty. Inside there’s another web address.

short.url web server:
Hello Web browser. Your page has moved. Ask for it at this address: http://www.snh.gov.uk/about-scotlands-nature/species/mammals/land-mammals/squirrels/

That’s redirection. The site also needs to handle registration: it should allow a user to submit a web address and be given an equivalent shortened address for it, adding the new pair of URLs to its database.

What ingredients are needed? A basic list would include a web server, a database, a mechanism for URL redirection, and a way for users to register URLs with the site. While not strictly necessary, an authentication mechanism would come in handy if you think you might want to allow users to delete or change a shortened URL.

To get a first attempt at a URL shortener up and running I provided each of these things then glued them together.

So, a slew of decisions.

1. For the implementation language I used Perl, because one of my aims was to revise my knowledge of some Perl technologies I had briefly used a while ago. At a minimum I wanted to revisit CGI (for processing and preparing HTTP requests and responses) and DBI (for interacting with a database). Additionally, Perl is what I currently use for most jobs and I reckoned I’d have enough on my plate without also learning a new language.
2. Who would use this, ultimately? I’m in the business of providing computing facilities for the School of Informatics, so it seemed wise to limit the site such that only Informatics users could register URLs on it. It equally seemed obvious that users would probably want to share their URLs worldwide, so non-privileged access – just using the short URLs – would need to be open to everyone. Luckily it’s easy to separate out access-for-Informatics-only from access-for-everyone. We put the access-for-everyone (the redirection of short URLs to the original longer ones) on http, and the access-for-Informatics-only (the registration and general admin of short URLs) on https, protected by Cosign. This sort of separation is used in some other Informatics websites so we have suitable LCFG configuration for setting up the necessary virtual host declarations on the Apache web server.
3. I’ve touched on this one already – I used our supported DICE Linux platform with our LCFG configuration technology. This one wasn’t really a decision so much as a no-brainer. LCFG installs and configures everything necessary for the project – OS, the web server, the Perl modules and all, and it’ll keep it configured correctly through software and OS upgrades, reinstalls and so on.
4. Some kind of database would be needed to store the URLs and their shortened URL codes. I’m not a database expert but I’m told by those who are that PostgreSQL is easily the best solution available to us on DICE, and clearly superior to certain other popular free SQL solutions.
5. URL redirection is done by getting the CGI script to send an HTTP redirect response. There are several of these, but the basic choice seems to come down to 301 and 302. A 301 signal is a permanent redirect. Once a web browser gets it, it can remember it in its cache, so on subsequent visits it won’t need to revisit the site which issued the redirect. A 302 is a more temporary redirect: the browser is redirected to the new site, but subsequent visits will go through the same redirect procedure once again. If you want to use a URL shortener to spy on gather information about people using a URL, the temporary redirect is the obvious choice. However the aim here, at least to start with, is to provide a simple URL redirection service, one which just redirects URLs; so I chose to use 301, the permanent redirect.
6. Obviously it would be good for a URL shortener to have a short domain name. For a wee personal project though I thought I’d stick with our normal DNS domain. It may not be the shortest out there but it’s not too prolix; it’s identified with the School of Informatics; and to add a selfish note, it’s in my fingers’ muscle memory.
7. Should we allow users to specify their own short URLs, or should we generate random ones? The first would be desirable, both would be best, but for simplicity I started off offering only random URLs.

Once these bits and pieces are in place, the remaining task is to write a CGI script which connects them together. It has to distinguish between an attempt to use a shortened URL (look up the shortened URL in the database to find its associated original URL then issue a redirect to this URL) and other visits. The other visits might be mere curiosity (give them a help message) and a desire to register a URL (give them a form which they can use to submit a URL to the site). The CGI also has to spot a submitted form response (and process it – generate a random short URL code, enter that and the original URL in the database, then give the user their new shortened URL).

And that, basically, was the first version of the URL shortener I put together. As soon as I tentatively showed it to a few colleagues, feature requests were made, so shortly afterwards it grew the ability to identify who was registering a URL (Cosign gives you this for free) and to remember this identity alongside each registered URL. This made it possible to give each authenticated user a list of their own registered URLs, and to provide the opportunity to delete each URL. It will also make it possible to let users edit their URL data, but that’ll be covered in a future post.

Ladies and gentlemen, I give you i.inf.ed.ac.uk. Play with it if you like. From this point onwards its database shall be treated with respect, so any URLs that the service issues to you will be preserved. There are a few caveats:

It’s internal only.
Until the CGI is a bit more sophisticated and has passed a security audit it will only be available on the Informatics intranet. The firewall will prevent external users from accessing it. The aim is to open it up to outside users (for redirection only; registration of new URLs will remain open only to DICE accounts), but not yet.
It’s very simple.
Once you have a basic URL shortener, the number of ways in which it might be extended is rather startling. Extensions will be covered a future post. Right now, it doesn’t have many fancy bells and whistles. You can add URLs, delete them again, and quote the shortened versions to other (School of Informatics) users. For the moment that’s about it.
Extensions are on the way.
The system will be extended as I get time to work on it.

Have fun, and let me know how you get on. And yes, I was a proud member of The Tufty Club.