OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Archive for the ‘Infoskills’ Category

Discovered Custom Search Engines

Although Google manages to serve up pretty good results most of the time, sometimes it makes sense to give the search engine a hand by limiting the search to only provide results from a particular set of pages, or domains. So in this post I’ll describe a couple of “emergent” or “discovered” custom search engines that are available in tools you might already use.

(Custom search engines provide one way of achieving this, of course – set the limits over which you want results returned from, et voila… But creating custom search engines, as such, is not necessarily something that would occur to most people.)

Let’s start with delicious, the social bookmarking service, in which users bookmark links to delicious, with one or more tags.

Did you know that there are now a range of tools within delicious that let you search over the titles and descriptions of different sets of bookmarks?

If you pick a particular user, the default Search these bookmarks search will just search over the title and description fields of the bookmarks saved by that user. If you further limit the view of the bookmarks to those tagged in a particular way by a particular user, then the Search these bookmarks search will be limited to just those bookmarks. In other words, Search these bookmarks is context sensitive to the user, tag or user’n’tag combination that is currently selected.

(Remember that the full text of the bookmarked pages is not being searched – only the bookmark title and description fields – which is one good reason why it makes sense to fill in a bit of description about every bookmark you make: it makes (re)discovery of links at a future time easier…)

So where else do people create there own resource collections, or resource feeds Google Reader, maybe?

And as it happens, another emergent, “auto-created” custom search engine can be found just there:

The Google Reader search provides a blogsearch facility that lets you limit your search to the content of the RSS feeds you subscribe to in a variety of ways: the content of all your feeds, the content of the items you’ve read, the content of feeds bundled in various folders, and so on.

So for example, you could bundle a set of RSS feeds together in a single folder, and then, as if by magic, you have a custom search engine that searches over just the contents of those feeds.

With Google’s “official” blogsearch tool no longer functioning as such (rather than just indexing feed content – that is, just actual blog posts – it appears to be indexing blog web pages, so you get contaminated results that may only be a “hit” because your query was matched by sidebar content or other blog website fluff), the Google Reader search tool goes back to basics…

…the only problem is, that so far as I can tell, there is no way to subscribe to the results of any of these searches, and there is no published (or community documented) API for the Google Reader search facility… (so if someone can watch the AJAX calls and produce one, I’d be really grateful :-)

(By the by, can you define filters on folders in Google Reader, a bit like iTunes Smart Playlists?)

See also: Search Hubs and Custom Search at ILI2007.

PS if you are looking for an effective blogsearchengine, Icerocket has been grabbing the buzz lately…

Written by Tony Hirst

January 20, 2009 at 1:44 pm

Posted in Infoskills, Search

A Couple of Twitter Search Tricks…

Just a quickie post, this one, to describe a couple of Twitter search tricks’n’tips (which is to say, this is an infoskills post, right?;-)

You can find the Twitter search tool at http://search.twitter.com. I actually call it in my browser using the keyword “tw” associated with a Firefox Keyword Search.

Link search: if you’re in the habit of searching social bookmarking sites such as delicious for useful links, whether by pivoting around particular tags or tag combinations, or by using the delicious search box, you might also be interested in searching for tweeted links. Here are a couple of ways of doing it…

The “official way”, using a Twitter advanced search form – just select the “contains Links” option.

This invokes a special search limit, filter:links, which you can also enter directly into the Twitter search box:

If for any reason that search limit isn’t working, here’s a workaround that makes use of Twitter search’s partial string matching capability:

Fan out: see which of your tweets have been retweeted by others (maybe;-)
This trick relies on a convention that has emerged in which Twitterers use the pattern along the lines ofRT @username “the original tweet”.

(See also the ReTweetist service, which will plot which of your messages have been retweeted, as well as the most popular current retweets.)

Also remember that you can subscribe to an RSS feeds of saved searches based on these query types:

Locale Based Searches
Want to know who’s recently been twittering (possibly) from nearby a particular location? Set the location options in the advanced search form, and run an otherwise empty query (i.e. no search terms in the search box).

So for example:

Now it used to be that you could search people’s biography or location strapline in Twitter, and find people to follow that way (that’s how I found several fellow Isle of Wight twitterers) , but that doesn’t seem possible using the “Find People” service at the moment? (And I can’t check to make sure, because the “Find People” service is temporarily stressed (i.e. down) again…).

So here’s a Google hack way round finding Twitterers from a particular location – construct a query of the form:

http://www.google.com/search?q=location+wight+site%3Atwitter.com+-inurl%3Astatus+-intitle%3Awight

This works as follows – look for the search term, on twitter.com (site:twitter.com), but try not to return results from tweets (-inurl:status) or where part of the location appears in the user’s Twitter ID (-intitle:wight). If an individual’s page is indexed when there’s a tweet showing that contains the search term, then you may get the page returned as a result. But more likely you’ll only get results from pages where the search term is always present, such as when it’s part of a person’s bio… In a sense, this is a bit like indexing a fixed set of web search engine indexable, on-page, bio/location meta-data.

[UPDATE: looking at the results preview, if we search for “Location Isle of Wight” we can probably filter the results even further:
“location isle of wight” site:twitter.com -inurl:status -intitle:wight

And as @daveyp suggests, we can also search for institutional allegiance within a profile, eg site:twitter.com -inurl:status -intitle:huddersfield location huddersfield university]

(You can do something similar to stalk people on MySpace.)

For more Twitter search tricks , check out the Twitter advanced search form, or have a creative play in Google;-)

Written by Tony Hirst

January 19, 2009 at 3:11 pm

Another Nail in the Coffin of “Google Ground Truth”?

So we all know that the Google web search engine famously (and not just apocryphally) returns different results from it’s different national representations (google.com. google.co.uk, google.cn, etc.)…

…and hopefully we all know that if you are signed in to Google when you run a search, the default settings are such that Google will record your search and search results click-thru behaviour using Google Web History, and then in turn potentially use this intelligence to tweak your personal search results…

…and depending on how much you’ve been paying attention, you may know that Google Search Wiki lets you “customize search by re-ranking, deleting, adding, and commenting on search results. With just a single click you can move the results you like to the top or add a new site. You can also write notes attached to a particular site and remove results that you don’t feel belong. These modifications will be shown to you every time you do the same search in the future.

Well now it seems that Google is experimenting with Google Preferred Sites, which let selected guinea pigs “set your Google Web Search preferences so that your search results match your unique tastes and needs. Fill in the sites you rely on the most, and results from your preferred sites will show up more often when they’re relevant to your search query” (see the official support page here”: Preferences: Preferred sites).

So the next time you give someone directions to a website using an instruction of the form “just google whatever, and it’ll be the first or second result”, bear in mind that it might not be…

(For what it’s worth, I run a cookie free, never logged in to Google browser to compare the results I get from my logged in’n’personalised Google results page and a raw organic” Google results page.)

Written by Tony Hirst

January 21, 2009 at 12:56 pm

Posted in Infoskills, Search

Tracking UK Parliamentary Act Amendments

A flurry of posts around the interwebs today (e.g. + A CONCEALED ASSAULT ON PRIVACY +) picked up on some proposed amendments to the Data Protection Act that have found their way into the Coroners and Justice Bill that I posted about last week (Data Sharing is Good, Right? Or is HM Gov Evil?).

It struck me that it would be really handy to have a tool that could alert you to proposed amendments in your favourite Act in whatever Bills happen to be live at the moment.

A quick look at one of the .gov.uk websites provides an advanced search form that lets you search over current bills – UK Parliament Advanced Search:

Running a search on the phrase “the Data Protection Act 1998″ and sorting the results by most recent first gave me a URL I could tinker with…

So here’s a pipe that’ll grab the most recent bills mentioning a particular act:

Clicking on a link should take you to the point in a Bill that mentions the Act you’re interested in:

Being a pipe, I get the RSS/JSON feed for free… which I can now subscribe to and use as an alerting service (for as long as the pipe’s screenscraping part works!) Ideally, of course, the parliamentary search would make results available as RSS…

As ever, this pipe took almost as much time to blog as it took to create…!

So maybe Charles Arthur should rethink If I had one piece of advice to a journalist starting out now, it would be: learn to code and instead focus on Learning to Think Like A Programmer?;-)

PS see also: They Work For You: Free Our Bills and They Work For You: Free Our Bills (Techy Stuff).

Written by Tony Hirst

January 23, 2009 at 3:22 pm

Posted in Infoskills, Pipework

writetoreply.org – Some Quick Thoughts

So it’s been a fun couple of days getting the writetoreply.org site up and seeing the first few comments roll in to the commentable version of the Digital Britain Interim report.

We made the Guardian Technology blog tonight – Digital Britain: Comments please! – and I can only reiterate the point Jack Schofield made in it:

So far, however, WriteToReply.org has only had 10 comments, spread over six sections and dozens of paragraphs.

I hope this is because not enough people know about it, rather than because not enough people care.

So have you commented yet? (I will as soon as I finish commenting on the POIT report, which I’m still half-way through!;-)

As we get more comments, we maybe able to roll out a few new features, and it will also give us something to work with on a comment dashboard/reporting pattern that we can make available to the report’s authors.

Also, be warned that I’m not going to post too much here about the site – we’ll be starting a blog [UPDATE: available at http://writetoreply.ord/actually] on the writetoreply site itself in a day or to capture what we’re learning and what we’re thinking – so if you’re interested in keeping close tabs on what we’re up to, I’d suggest following @writetoreply on Twitter. (I will post round up/summary linking reports here, though, so you’ll still get to see glimpses of what we’re doing ;-)

If you want to get involved with brainstorming ideas for the site – or suggesting reports to host there – please send a message to @WriteToReply or contribute to the wiki: WriteToReply wiki

One thing I do want to mention here – almost as a note to self, because I’ll pursue this more on the WriteToReply blog – is that even if we don’t get many comments on the site, there is still value in it being there…

Why?

Because each paragraph is identified by a named anchor, each paragraph is linked to by a unique URI; for example, here’s a link to Action 1 of the Digital Britain Interim Report:

What this means is that if people want to comment about a particular section, action or paragraph within the report on their own blog or other publication, they can link to it.

Like in this post from the Nominet blog – :

A Storm in a Teacup or a Perfect Storm?

Which results in a Trackback on the WriteToReply site, that is included in the comment feed, and that looks this:

(Note that this is where we have to start upping the spam/trackback spam defense tools!;-)

What this means is that the paragraph, action point, section or whatever can become a linked resource, or linked context, and can support remote commenting.

And in turn, the remark made on the third party site can become a linked annotation to the corresponding part of the original report…

How?

Well through the judicious use of trackbacks, link: search limits on the bigger search engines, and link searches in services like BackType (that I discovered via Euan Semple:-), we’ll find ways of pulling those remote comments and discussions into the writetoreply environment (hopefully…?!;-)

So even if you don’t want to comment on the Digital Britain Interim report on the WriteToReply site, but you do care, why not post your thoughts on your own blog, and link your thoughts directly back to the appropriate part of the report on WriteToReply?

(And remember, the final report will have consequences, so if you have something to contribute, make sure you do… :-)

Written by Tony Hirst

February 5, 2009 at 11:29 pm

Posted in Infoskills, Policy

Mapping Realtime Events on Twitter

One of the nice things about blogging in WordPress is the dashboard report that shows which other blog pots are linking in to your blog. One of my posts that’s had a couple of incoming links lately is Simple Embeddable Twitter Map Mashup, firstly from this post on TweetMapping Archaeology and then from Twitter Watermain Mapping – Part Two.

This latter post – plotting out tweets around the occurrence of breaks in watermains – also plots out a map showing people twittering about stormwater.

Which got me thinking, how about a Twittermap to plot tweets about electricity, gas or water being cut off?

By altering the search term, you can search for other events, such as earthquake or bee swarm:

If you want to search around a particular location, then this pipe may be more useful- locale based twittermap (the default search is for the word bid, but it works equally well if you’re wondering where the fire is):

Finally if you’d rather just use the URL for a Twitter search feed as the basis for a map, this pipe should do: map a Twitter search feed URL.

Written by Tony Hirst

March 26, 2009 at 8:04 pm

Posted in Infoskills, Pipework, Search

Tagged with , ,

Searching By Looking Elsewhere

A couple of weeks or so ago, I got an email requesting a link to something I’d spoken about at a department meeting some time ago (the Gartner hype cycle, actually). Now normally I’d check my delicious bookmarks for a good link, or maybe even run a Google web search, but instead I ran a search for ‘gartner hypecycle 2008′ on Google Images

…which is when it struck me that searching Google Images may on occasion lead to better quality, or more relevant, results than doing a normal web search, particularly if you use a level of indirection. In particular, it can often lead to a web document or post that provides some sort of analysis around a topic. (Remember, Google image search links to the web pages that contain the images that are displayed in the image search results, not just the images.)

So for example, a web search for games console sales chart [web search] turns up a different set of results to an image search for games console sales chart [image search]. And here’s where my gut feeling comes in about using the fact that documents contain images as a filter – if people have gone to the trouble of including a relevant image in something they have published, their post may be more considered on a particular topic than one that doesn’t. That is, the inclusion of a relevant image can be used as a valuable ranking term when searching for results. Essentially, you are running an advanced, search limited query around an image document type.

Note that it’s often sensible, when sharing image queries, to make the search a ‘safe’ (i.e. adult content filtered) one: in Google, just add &safe=active to the end of the URL.

(The image search approach also lets me quickly scan the results for one that appears to contain the sort of chart data I want. Supporting visual filtering is one reason why some search engines have experimented with including an image from each linked to page in the search engine results listing.)

Limiting searches by document type can also be achieved in a normal web search too, of course. For example, if you are looking for a report on knife crime in UK cities, then it might be reasonable to suspect that the most relevant documents were published as PDFs – so limit on that:

If you’d rather use the normal Google search box as a command line, the search query is: uk+knife+crime+report+filetype:pdf

If you’re looking for actual data, it might make sense to search on spreadsheet documents? uk knife crime statistics filetype:xls

As well as variously using the keyword ‘chart’ or ‘statistics’, the word ‘data’ or ‘table’ can also help tune results, particularly when running an image search. Remember, the point may not necessarily to find a chart, or set of data directly. Instead, it may be using the fact that a document contains a chart or a table to limit the results you get back (assuming that documents or posts containing charts, tables, etc., are likely to be more considered on a particular topic simply because the author has gone to the trouble of including a a chart or a table etc.)

Increasingly, I find I’m also using Youtube to search for particular items of BBC content. Note that my motivation here is not necessarily to use the video clip I have found directly, mainly because a lot of BBC related footage on Youtube has not been put there by the BBC – i.e. it is more likely to be copyright infringing content uploaded by an individual.

Instead, I am making use of:

1) the segmenting of video clips that individuals have done (chopping a 3 minute clip out of an hour long documentary, for example);
2) the user provided metadata around the clip – the title they have given it, the description text, the tags used to annotate it;
3) the automatically generated ‘related video’ service provided by Youtube,

to help me deep search into BBC content so that I can quickly find a clip that can then be obtained in a rights approved manner, without having to wade through hours and hours of video searching for a clip I want to use.

That is, it is possible to use Youtube as a great big index of BBC ‘deep clips’, in the sense that they are clipped from deep within a longer programme, to locate a particular clip that can then be obtained in a rights cleared fashion: searching Youtube to find something that I will then go elsewhere for.

So the take home message from this post? The best place to search for a particular resource may not be the obvious one.

Written by Tony Hirst

May 11, 2009 at 10:29 am

Posted in BBC, Data, Infoskills, Search, SEO

Follow

Get every new post delivered to your Inbox.

Join 843 other followers