<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>OUseful.Info, the blog... &#187; Using Twitter Lists to Define Custom Search Engines</title>
	<atom:link href="http://blog.ouseful.info/2010/07/12/using-twitter-lists-to-define-custom-search-engines/feed/?withoutcomments=1" rel="self" type="application/rss+xml" />
	<link>http://blog.ouseful.info</link>
	<description>Trying to find useful things to do with emerging technologies in open education</description>
	<lastBuildDate>Thu, 23 May 2013 14:40:45 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='blog.ouseful.info' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>OUseful.Info, the blog... &#187; Using Twitter Lists to Define Custom Search Engines</title>
		<link>http://blog.ouseful.info</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.ouseful.info/osd.xml" title="OUseful.Info, the blog..." />
	<atom:link rel='hub' href='http://blog.ouseful.info/?pushpress=hub'/>
		<item>
		<title>Using Twitter Lists to Define Custom Search Engines</title>
		<link>http://blog.ouseful.info/2010/07/12/using-twitter-lists-to-define-custom-search-engines/</link>
		<comments>http://blog.ouseful.info/2010/07/12/using-twitter-lists-to-define-custom-search-engines/#comments</comments>
		<pubDate>Mon, 12 Jul 2010 12:18:08 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Tinkering]]></category>
		<category><![CDATA[IWMW10]]></category>
		<category><![CDATA[tw]]></category>
		<category><![CDATA[twitter lists]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=3755</guid>
		<description><![CDATA[A long time ago, I used to play with search engines all the time, particularly in the context of bounded search, (that is, search over a particular set of web pages of web domains, e.g. Search Hubs and Custom Search at ILI2007). Although I&#8217;m not at IWMW this year, I can&#8217;t not have an IWMW [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=3755&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>A long time ago, I used to play with search engines all the time, particularly in the context of <em>bounded search</em>, (that is, search over a particular set of web pages of web domains, e.g. <a href="http://ouseful.open.ac.uk/blogarchive/010686.html">Search Hubs and Custom Search at ILI2007</a>). Although I&#8217;m not at IWMW this year, I can&#8217;t not have an IWMW related tinker, so here&#8217;s a quick play around IWMW related twittering folk&#8230;</p>
<p>To start with, let&#8217;s have a look at the IWMW Twitter account:</p>
<p><a href="http://twitter.com/iwmw" title="Photo Sharing"><img src="http://farm5.static.flickr.com/4080/4785822465_6455213e0d.jpg" width="500" height="261" alt="IWMW lists" /></a></p>
<p>We see there are several twitter lists associated with the account, including one for participants&#8230;</p>
<p>Looking around the IWMW10 website, I also spy a community area, with a Google Custom search engine that searches over institutional web management blogs that @briankelly, I presume, knows about:</p>
<p><a href="http://iwmw.ukoln.ac.uk/community/search/" title="Photo Sharing"><img src="http://farm5.static.flickr.com/4115/4785834659_a99224fd0a.jpg" width="500" height="390" alt="Institutional Web Managemet blogs search engine" /></a></p>
<p>It seems a bit of a pain to manage though&#8230; &#8220;Please contact Brian Kelly if you would like your blog to be included in this list of blogs which are indexed&#8221;</p>
<p>Ever one to take the lazy approach, I wondered whether we could create a useful search engine around the URLs disclosed on the public Twitter profile page of folk listed on the various IWMW Twitter lists. The answer is &#8220;not necessarily&#8221;, because the URLs folk have posted on their Twitter profiles seem to point all over the place, but it&#8217;s easy enough to demonstrate the raw principle.</p>
<p>So here&#8217;s the recipe:</p>
<p>- find a Twitter list with interesting folk on it;<br />
- use the Twitter API to <a href="http://apiwiki.twitter.com/Twitter-REST-API-Method:-GET-list-members">grab the list of members on a list</a>;<br />
- the results include profile information of everyone on the list &#8211; including the URL they specified as a home page in their profile;<br />
- grab the URLs and generate an annotations file that can be used to import the URLs into a <a href="http://www.google.co.uk/cse/">Google Custom Search Engine</a>;<br />
- note that the annotations file should include a label identifier that specifies which CSE should draw on the annotations:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/4785863995/" title="Photo Sharing"><img src="http://farm5.static.flickr.com/4143/4785863995_256839aba9.jpg" width="500" height="255" alt="Google CSE config" /></a></p>
<p>Once the file is uploaded, you should have a custom search engine built around the URLs folk followed in the twitter list have revealed in their twitter profiles (here&#8217;s my <a href="http://www.google.co.uk/cse/home?cx=009190243792682903990:bz1mxdkal7m">IWMW Participants CSE (list date: 12:00 12/7/10)</a></p>
<p>Note that to create sensibly searchable URLs, I used the heuristics:</p>
<p>- if page URL is <em>example.com</em> or <em>example.com/</em>, search on <em>example.com/*</em><br />
- by default, if page is <em>example.com/page.foo</em>, just search on that page.</p>
<p>I used Python (badly!;-) and the tweepy library to generate my test CSE annotations feed:</p>
<pre class="brush: python; title: ; notranslate">import tweepy

#these are the keys you would normally use with oAuth
consumer_key=''
consumer_secret=''

#these are the special keys for single user apps from http://dev.twitter.com/apps
#as described in http://dev.twitter.com/pages/oauth_single_token
#select your app, then My Access Token from the sidebar
key=''
secret=''

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(key, secret)
api = tweepy.API(auth)

#this identifier is the identifier of the Google CSE you want to populate
cseLabelFromGoogle=''

listowner='iwmw'
tag='iwmw10participant'

auth = tweepy.BasicAuthHandler(accountName, password)
api = tweepy.API(auth)

f=open(tag+'listhomepages.xml','w')

cse=cseLabelFromGoogle

f.write(&quot;&lt;GoogleCustomizations&gt;\n\t&lt;Annotations&gt;\n&quot;)

#use the Cursor object so we can iterate through the whole list
for un in tweepy.Cursor(api.list_members,owner=listowner,slug=tag).items():
    if  type(un) is tweepy.models.User:
      l=un.url
      if l:
        l=l.replace(&quot;http://&quot;,&quot;&quot;)
        if not l.endswith('/'):
          l=l+&quot;/*&quot;
        else:
          if l[-1]==&quot;/&quot;:
            l=l+&quot;*&quot;
        f.write(&quot;\t\t&lt;Annotation about=\&quot;&quot;+l+&quot;\&quot; score=\&quot;1\&quot;&gt;\n&quot;)
        f.write(&quot;\t\t\t&lt;Label name=\&quot;&quot;+cse+&quot;\&quot;/&gt;\n&quot;)
        f.write(&quot;\t\t&lt;/Annotation&gt;\n&quot;)

f.write(&quot;\t&lt;/Annotations&gt;\n&lt;/GoogleCustomizations&gt;&quot;)

f.close()</pre>
<p>(Here&#8217;s the code as a <a href="http://gist.github.com/567473">gist</a>, with tweaks so it runs with oAUth.)</p>
<p>Running this code generates a file (<em>listhomepages.xm</em>) that contains Google custom search annotations for a particular Google CSE, based around the URLs declared in the public twitter profiles of people listed in a particular list. This file can then be uploaded to the Google CSE environment and used to help configure a bounded search engine.</p>
<p>So what does this <em>mean</em>? It means that if you have a identified a set of people sharing a particular set of interests using a Twitter list, it&#8217;s easy enough to generate a custom search engine around the webpages or domains they have declared in their Twitter profile.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/3755/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/3755/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=3755&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2010/07/12/using-twitter-lists-to-define-custom-search-engines/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://farm5.static.flickr.com/4080/4785822465_6455213e0d.jpg" medium="image">
			<media:title type="html">IWMW lists</media:title>
		</media:content>

		<media:content url="http://farm5.static.flickr.com/4115/4785834659_a99224fd0a.jpg" medium="image">
			<media:title type="html">Institutional Web Managemet blogs search engine</media:title>
		</media:content>

		<media:content url="http://farm5.static.flickr.com/4143/4785863995_256839aba9.jpg" medium="image">
			<media:title type="html">Google CSE config</media:title>
		</media:content>
	</item>
	</channel>
</rss>
