Living With Minified URLs

There seem to have a been a lot of posts recently about URL shorteners/minifiers, such as this or this, which linked back to On URL shorteners by delicious founder Joshua Schachter. I’m not sure if Brian Kelly has done a risk assessment post about it yet, though? ;-)

So what are we to do in the case of URL shorteners going down, or disappearing from the web?

How about this?

When you publish a page, do a lookup using the most popular URL shortener sites to grab the shortened URL for that page from those services, and hard code those URLs into the page as metadata:

<meta name=”shortURL” content=”http://tinyurl.com/dafntf&#8221; />
<meta name=”shortURL” content=”http://bit.ly/29m915&#8243; />

Then if a particular URL shortener service goes down, there’s a fallback position available in the form of using the web search engines to track down your page, as long as they index the page metadata?

PS it also strikes me that if a URL service were to go down, it’d be in e.g. Google’s interests to buy up their databases in the closing down fire sale…

PPS annotating every page would potentially introduce overload the URL shortening services, I suspect, so I wonder this: maybe page publishers should inject the meta data into a page if they see incoming referrer traffic coming in to the page from a URL shortening service? So for example, if the server sees incoming traffic to a page from is.gd, it grabs the is.gd short URL for the page and adds it to the page metadata? This is not a million miles away from a short URL trackback service? (Cf. also e.g. things like Tweetback.)

PPPS via Downes: Short URL Auto-Discovery (looks like it’s being offered as an RFC).

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

9 thoughts on “Living With Minified URLs”

  1. “How would you decide which of the many URL shorteners to index and include?”

    that’s where the trackback mdoel comes in, maybe? The important thing is to capture a reference within a page to shortened urls that are out there and being used as a proxy to point to a page, rather than capture every possible shortened URL.

  2. So for example, if the server sees incoming traffic to a page from is.gd, it grabs the is.gd short URL for the page and adds it to the page metadata?

    This would only work when going via a preview page as normally sites issue a 302 redirect which causes no information about the shortening service to be sent in the request.

  3. I an taking a new tact for my organization– I am running our own shortened service using an open source code I think was called phurl. It’s basic but I have my own database of urls, and without too much effort can add some stats tracking– plus they are now “branded”.

    I am not too worried about the dissapearing services which would be a mjor annoyance but not ubsolvable problem. It would be helpful for said services to have an API to look up addresses so you can perhaps create your own backup. The ones with accounts seem preferable– bit.ly and tr.im as at least they have some means to track ones you have made.

    But talk about services needing a business model? What works for URL shrinkers? Premium service to insure they are not lost? Or is it in the mining of link patterns?

  4. This could be interesting too: “canonical” links…
    [ http://revcanonical.appspot.com/ ]

    ‘RevCanonical is url shortening with a twist. Instead of creating its own super short versions of links, it checks to see if the link owner has published a shortened version of the given page using HTML link element. If not, we just return the original URL. And you should bug the link owner about providing a better alternative.

    ‘RevCanonical searches the referenced resource for:

    * <link rev=”canonical” href=”…”> (i.e. “I am the canonical URL of that page over there”)
    * <link rel=”alternate shorter” href=”…”> (or truth be told any link rel including the string “alternate short*”)’

  5. Sam Johnston’s post “rev=canonical considered harmful (complete with sensible solution)” [ http://samj.net/2009/04/revcanonical-considered-harmful.html ] makes a very valid point about abusing the /rev=”canonical”/ attribute to point to a short URL.

    Just for the record, I don’t think that the short URL should necessarily be THE canonical URL or that rev=canonical should be used this way. I do think that /rev=short/ might be useful though (where “short” means “short canonical” I guess?)

    There may well be better ways of managing this as Sam Johnston points out, but this is a pragmatic solution for people who may be limited to customising the head element of their web docs as unparsed HTML.

Comments are closed.

%d bloggers like this: