Archive for September 2010
Yahoo Pipes Code Generator (Python): Pipe2Py
Wouldn’t it be nice if you coud use Yahoo Pipes as a visual editor for generating your own feed powered applications running on your own server? Now you can…
One of the concerns occasionally raised around Yahoo Pipes (other than the stability and responsiveness issues) relates to the dependence that results on the Yahoo pipes platform from creating a pipe. Where a pipe is used to construct an information feed that may get published on an “official” web page, users need to feel that content will always be being fed through the pipe, not just when when Pipes feels like it. (Actually, I think the Pipes backend is reasonably stable, it’s just the front end editor/GUI that has its moments…)
Earlier this year, I started to have a ponder around the idea of a Yahoo Pipes Documentation Project (the code appears to have rotted unfortunately; I think I need to put a proper JSON parser in place:-(, which would at least display a textual description of a pipe based on the JSON representation of it that you can access via the Pipes environment. Around the same time, I floated an idea for a code generator, that would take the JSON description of a pipe and generate Python or PHP code capable of achieving a similar function to the Pipe from the JSON description of it.
Greg Gaughan picked up the challenge and came up with a Python code generator for doing just that, written in Python. (I didn’t blog it at the time because I wanted to help Greg extend the code to cover more modules, but I never delivered on my part of the bargain:-(
Anyway – the code is at http://github.com/ggaughan/pipe2py and it works as follows. Install the universal feed parser (sudo easy_install feedparser) and simplejson (sudo easy_install simplejson), then download Greg’s code and declare the path to it, maybe something like:
export PYTHONPATH=$PYTHONPATH:/path/to/pipe2py.
Given the ID for a pipe on Yahoo pipes, generate a Python compiled version of it:
python compile.py -p PIPEID
This generates a file pipe_PIPEID.py containing a function pipe_PIPEID() which returns a JSON object equivalent of the output of the corresponding Yahoo pipe, the major difference being that it’s the locally compiled pipe code that’s running, not the Yahoo pipe…
So for example, for the following simple pipe, which just grabs the OUseful.info blog feed and passes it straight through:
we generate a Python version of the pipe as follows:
python compile.py -p 404411a8d22104920f3fc1f428f33642
This generates the following code:
from pipe2py import Context
from pipe2py.modules import *
def pipe_404411a8d22104920f3fc1f428f33642(context, _INPUT, conf=None, **kwargs):
"Pipeline"
if conf is None:
conf = {}
forever = pipeforever.pipe_forever(context, None, conf=None)
sw_502 = pipefetch.pipe_fetch(context, forever, conf={u'URL': {u'type': u'url', u'value': u'http://blog.ouseful.info/feed'}})
_OUTPUT = pipeoutput.pipe_output(context, sw_502, conf={})
return _OUTPUT
We can then run this code as part of our own program. For example, grab the feed items and print out the feed titles:
context = Context() p = pipe_404411a8d22104920f3fc1f428f33642(context, None) for i in p: print i['title']
Not all the Yahoo Pipes blocks are implemented (if you want to volunteer code, I’m sure Greg would be happy to accept it!;-), but for simple pipes, it works a dream…
So for example, here’s a couple of feed mergers and then a sort on the title…
And a corresponding compilation, along with a small amount of code to display the titles of each post, and the author:
from pipe2py import Context
from pipe2py.modules import *
def pipe_2e4ef263902607f3eec61ed440002a3f(context, _INPUT, conf=None, **kwargs):
"Pipeline"
if conf is None:
conf = {}
forever = pipeforever.pipe_forever(context, None, conf=None)
sw_550 = pipefetch.pipe_fetch(context, forever, conf={u'URL': [{u'type': u'url', u'value': u'http://blog.ouseful.info/feed'}, {u'type': u'url', u'value': u'http://feeds.feedburner.com/TheEdTechie'}]})
sw_572 = pipefetch.pipe_fetch(context, forever, conf={u'URL': {u'type': u'url', u'value': u'http://www.greenhughes.com/rssfeed'}})
sw_580 = pipeunion.pipe_union(context, sw_550, conf={}, _OTHER = sw_572)
sw_565 = pipesort.pipe_sort(context, sw_580, conf={u'KEY': [{u'field': {u'type': u'text', u'value': u'title'}, u'dir': {u'type': u'text', u'value': u'ASC'}}]})
_OUTPUT = pipeoutput.pipe_output(context, sw_565, conf={})
return _OUTPUT
context = Context()
p = pipe_2e4ef263902607f3eec61ed440002a3f(context, None)
for i in p:
print i['title'], ' by ', i['author']
And the result?
MCMT013:pipes ajh59$ python basicTest.py
Build an app to search Delicious using your voice with the Android App Inventor by Liam Green-Hughes
Digging Deeper into the Structure of My Twitter Friends Network: Librarian Spotting by Tony Hirst
Everyday I write the book by mweller
...
So there we have it.. Thanks to Greg, the first pass at a Yahoo Pipes to Python compiler…
PS Note to self… I noticed that the ‘truncate’ module isn’t supported, so as it’s a relatively trivial function, maybe I should see if I can write a compiler block to implement it…
PPS Greg has also started exploring how to export a pipe so that it can be run on Google App Engine: Running Yahoo! Pipes on Google App Engine
My Twitter Community Grabbing Code – newt.py
In a series of recent posts, I’ve shown various network graphs and custom search engines generated from various things on twitter – communities around hashtags, users on lists, friends of an individual and so on.
Last weekend, I tried to pull together the code into some sort of integrated but extensible mess for my own tinkering. The core code can be found here: newt.py [UPDATE: more recent - and even more cluttered code - at https://github.com/psychemedia/newt]. It may or may not work for you, work correctly, or work at all. I may or may not update that file (though if I do I will try not to change function names or types they return, nor add or remove side-effects, of which there may be many in the most unlikely of places.) The functions are all over the place in the file, and almost completely undocumented. They also demonstrate my cut’n'paste from others understanding of Python. Whatever.
To use the code, you will need to install tweepy, and also add various keys into the functions at the top of the newt.py file. There are routines in there to generate Gephi GDF files, and Google custom search engine config files.
Here are a handful of scripts that I’ve been using to generate various output files.
The first (listfromTwapperkeepr.py) has a go at grabbing archived tweets (by hashtag) from Twapperkeeper. Command line usage is:
python listfromTwapperkeepr.py TAG START END LIMIT
where TAG is the hashtag (eg rswebsci), START and END are dates (in format YYYY-MM-DD, err, I think?! Whatever it is that Twapperkeeper expects;-), and LIMIT is the number of tweets that must appear from a particular person in the grabbed archive for that user to be added to the list. The Twitter list that is generated will be added with the name TAG to the authenticated user’s account (i.e. the user associated with the OAuth keys…) If the list already exists, it should just get updated with users not on the list (but no users will be removed…)
import sys,newt
def report(m):
newt.report(m,True)
#----------------------------------------------------------------
#user settings
tag=sys.argv[1]
start=sys.argv[2]
end=sys.argv[3]
limit=int(sys.argv[4])
#----------------------------------------------------------------
twsn={}
twsn= newt.getTwapperkeeperArchiveTweeters(twsn,tag,start,end)
report("List members:")
tw=[]
tws={}
for i in twsn:
tws[i]=twsn[i]['count']
if tws[i]>=limit:
tw.append(i)
#Print out the twitterers in order of who tweeted most
for i in sorted(tws, key=tws.get, reverse=True):
msg=i+' '+str(tws[i])
report(msg)
api=newt.getTwitterAPI()
user=api.auth.get_username()
newt.addManyToListByScreenName(api,user,tag,tw)
The second script (basicListNet.py) grabs users from a Twitter list, then plots out a network showing which members of the list friend other members of the list. Usage is:
python basicListNet.py USER LIST.
Files will be generated… ;-) In the CSE generation file, the XXX should be replaced by the label for the desired Google CSE (available from the Advanced settings page on the appropriate Google CSE existing site manager).
import sys,newt
def report(m):
newt.report(m,True)
api=newt.getTwitterAPI()
#----------------------------------------------------------------
#user settings
user=sys.argv[1]
list=sys.argv[2]
#----------------------------------------------------------------
tw={}
tw=newt.listDetailsByScreenName({},api.list_members,user,list)
report("List members:")
for i in tw:
report(tw[i].screen_name)
'''
report("List members:")
for i in tw:
report(tw[i].screen_name)
'''
newt.gephiOutputFile(api,list, tw)
newt.googleCSEDefinitionFile("XXX",list, tw)
newt.googleCSEDefinitionFileWeighted("XXX",list, tw)
The third script (basicUserNet.py) grabs the friends list for a specified user and then geneates the graph of which of their friends are friends of each other.Usage is:
python basicUserNet.py USER
import sys,newt
def report(m):
newt.report(m,True)
api=newt.getTwitterAPI()
#----------------------------------------------------------------
#user settings
user=sys.argv[1]
#----------------------------------------------------------------
tw={}
tw=newt.getTwitterFriendsDetailsByIDs(api,user)
report("List members:")
for i in tw:
report(tw[i].screen_name)
'''
report("List members:")
for i in tw:
report(tw[i].screen_name)
'''
newt.gephiOutputFile(api,user, tw)
newt.googleCSEDefinitionFile("XXX",user, tw)
newt.googleCSEDefinitionFileWeighted("XXX",user, tw)
And the fourth script (basicTwapperkeeperNet.py) gets an archive from Twitter and generates the network of archived users who have friended each other. Usage is:
python basicTwapperkeeperNet.py TAG START END LIMIT
import sys,newt
def report(m):
newt.report(m,True)
api=newt.getTwitterAPI()
#----------------------------------------------------------------
#user settings
tag=sys.argv[1]
start=sys.argv[2] #'2010-09-23'
end=sys.argv[3]
limit=int(sys.argv[4])
#----------------------------------------------------------------
twsn={}
twsn= newt.getTwapperkeeperArchiveTweeters(twsn,tag,start,end)
report("List members:")
tw=[]
tws={}
for i in twsn:
tws[i]=twsn[i]['count']
if tws[i]>=limit:
tw.append(i)
tw=newt.getTwitterUsersDetailsByScreenNames(api,tw)
#Print out the twitterers in order of who tweeted most
for i in sorted(tws, key=tws.get, reverse=True):
msg=i+' '+str(tws[i])
report(msg)
newt.gephiOutputFile(api,tag, tw)
newt.googleCSEDefinitionFile("XXX",tag, tw)
newt.googleCSEDefinitionFileWeighted("XXX",tag, tw)
The following script is experimental – and shows how to extend and merge data from several sources. A use case, for example, would be accessing lists of MPs on Twitter, by party and creating a Gephi output file whereby each twitterer node definition is annotated with their party.
import newt
def report(m):
newt.report(m,True)
api=newt.getTwitterAPI()
#----------------------------------------------------------------
#user settings
user='ousefulapi'
list='cam23'
#----------------------------------------------------------------
tw={}
tw=newt.listDetailsByID(tw,api.list_members,user,list)
twx=newt.extendUserList(tw,["typ1 labour","typ2 sads"])
tw2=newt.listDetailsByID({},api.list_members,user,'alpsp')
tw2x=newt.extendUserList(tw2,["typ1 sdghg","typ2 sads"])
twx=newt.mergeDicts([twx,tw2x],True)
report("List members:")
twd=newt.deExtendUserList(twx)
for i in twd:
report(twd[i].screen_name)
'''
#testing
ttx={}
for t in tw:
ttx[t]={}
ttx[t]['user']=tw[t]
ttx[t]['classVals']={}
ttx[t]['classVals']['typ1']='aser'
ttx[t]['classVals']['typ2']=34
'''
ttx=twx
newt.gephiOutputFileExtended(api,'test', ttx,['typ1 VARCHAR','typ2 INT'])
Although the code sort of works, it’s a bit messy in terms of the representation used. So I’m going to give up on it and start afresh, probably using Networkx, In the new version, as soon as I load node or edge data in to what I shall probably call newtx, it will be added to a networkx graph. Then I can do network stats on the graph/network in Python, rather than having to use Gephi to run stats on the graph, as well as generating visual outputs using matplotlib.
XML Data Scraping and Screen Scraping with YQL
Although it seems as if the OU’s repository data is, henceforth, going to be made available as Linked Data, I thought I’d just post this quick demo I’ve had laying around for a week or two demonstrating how to use YQL to run queries over the XML data files published from the OU’s eprints server, as well as how to use it to scrape structured data from an HTML page (in this case, from Lanyrd).
Picking the “submissions by author” listing for the first author in the list (which happens to be James Aczel, not that you’d know it from the listing page as shown below…;-), we see a variety of ways of exporting the James’ publication list:
If we select the “EP3 XML” (ePrints v3?) option, this is what we get:
The URL for the XML export view, as for the HTML page is well structured around a unique identifier for James, although I can’t tell you what because the bit of the OU that “owns” the minting of the identifiers is very protective of them, how they’re used, and the extent to which they’re allowed to be made public… doh!
Anyway, given the URL for the XML version of someone’s publications, we can treat that document as a database using YQL.
So for example, here’s a query that will select the id and name of all the authors, orgainised by publication, on papers where James Aczel was an author:
select eprint.id, eprint.creators from xml where url='http://oro.open.ac.uk/cgi/exportview/person/jca25/XML/jca25.xml'
(I wrote this query because I wanted to start exploring co-author neworks around particular authors…)
Here it is in the YQL Developer console:
And by extension, this query gets the authors and paper titles for papers where Jame has some creator attribution.
If we parameterise the query (as described here: YQL Web Service URLs), we can generate a “nice” parameterised URL for the query that returns the YQL XML query response.
So for example, here’s response data for the user with ID bkh2:
(I don’t know how to construct the URL within YQL from just the ID? If you know, please post the answer in a comment…;-)[See @hapdaniel's comment below :-)]
You may have noticed that YQL uses a dot notation in the construction of the SELECT component of the query to identify the data fields you want returning from the query. A rather different approach can be taken when using YQL to screenscrape data from an HTML page, in particular by using an XPATH expression to direct the query to the parent HTML element you want returning. So for example, we can scrape the list of attendees at an event as listed on Lanyrd, such as today’s #RSWebSci event:
Here’s the YQL to scrape that list, using an XPATH expression to target in on the appropriate HTML container element:
select href from html where url='http://lanyrd.com/2010/rswebsci/' and xpath='//div[@class="attendees-placeholder placeholder"]/ul[@class="user-list"]/li/a'
(I think an exact match is being run in this expression on the class attributes?)
Here’s the result:
You’ll notice the scraped data is a list of paths off the Lanyrd domain to personal profile pages. So given these users’ URLs, we can scrape their Lanyrd profile pages to find their their Twitter IDs:
select href from html where url='http://lanyrd.com/people/nigel_shadbolt/' and xpath='//a[@class="icon twitter url nickname"]'
If we’re thinking linked data (little ‘l’, little ‘d’;-), we might them use these Twitter URLs to see if we can find other webpages for the same person using Google’s otherme service:
http://socialgraph.apis.google.com/otherme?q=http://twitter.com/timbernerslee
PS just by the by, during the course of the #rswebsci event, I automatically generated a custom search engine around the #rswebsci hashtaggers websites and a quick viz of the structure of the hashtaggers’ network.
With a bit of luck, I’ll get round to posting the code for grabbing this data from the Twitter API in the next day or two…
So How Does the Twitter Backchannel Work When The Chatham House Rule Is in Place?
As I type, there is a spinoff meeting (that I’m not at) from the #RSWebSci event operating from a location near Milton Keynes (@martinjemoore: “At tremendous Kavli Centre, Royal Society’s base in Buckinghamshire, for satellite meeting about future of the web & web science #rswebsci”) and being held under the Chatham House rule:
“When a meeting, or part thereof, is held under the Chatham House Rule, participants are free to use the information received, but neither the identity nor the affiliation of the speaker(s), nor that of any other participant, may be revealed”.
In the RSWebSci event, from the backchannel we can identify the participants and their affiliations from any tweets they make from the event:
So for example, when @timdavies mentions: “Nigel Shadbolt asking “What are Chatham house rules for Twitter” at #rswebsci… Opps, er, I mean Someone asking.” we know that both Tim Davies (“Consultant and action researcher focussing on civic engagement and social technology. Specific focus on youth engagement & open data”) and Nigel Shadbolt are at the event. And from tweets like “#RSwebsci hilarious moment when one unattributable person forgets other unattributable person’s name”, we can assume that the originator of that tweet is also at the event (and maybe that they are not either of the unattributable persons mentioned?)
If we know an event is happening, and we know the sorts of people it is likely to attract (e.g. by looking at the Twitterers from the last couple of days of the #rswebsci event), if a Twitter blackout is in operation we can look to Twitter histories to see who was not tweeting during the event who might normally be expected to be tweeting over that period, and tentatively locate them at the event. We can also rule out people who have declared they aren’t there (@cameronneylon: “I decided not to go to #RSWebsci and satellite meeting because I had too much “proper” work to do. Think I probably picked wrong…”), unless they’re bluffing…?!;-)
From tweets so far, we know via @lescarr that there are several sessions taking place (“”Breakthroughs in Web Science”, “Dark Web”, “Networks in web science”, “Govt open data” and “Collaborative Science” sessions at #RSwebsci”). From clustering the folk who we know to be at, and suspect to be at the event, we might tentatively allocate them to different sessions, with a particular probability. If different hashtags are used for each session, the sort of thing @briankelly (who I don’t think is at the event) often lobbies for, it makes conversation analysis maybe a little easier?
On the topic of conversation analysis, or at least time series analysis (using a tool such as TimeFLow, for example?), we might be able to use some form of it to identify who said what from inspecting the timeline. For example, if @ianmulvany is a truth teller, and says at 9.47 “#RSWebsci time to pitch my idea”, we can monitor tweets over the next few minutes to see if any ideas that are reported are the sort of thing he might have come up with, given we can find out easily enough that he works for Mendeley. So maybe @timdavies’ mention at 9.53 that “#rswebsci @? “Crowdsourcing & crowdcurating more an art than a science right now” <– Shd it develop as science? Or best in domain of art…", that crowdsourcing thing is something that I could imagine Ian saying (P=0.7?) The idea as to whether it's science or art is presumably Tim Davies'?
Just by the by, TIm's use of @? comes from a suggestion I made about a possible "chatham bot" that would accept DMs, anonymise the sender and replace any @name attributions with @?. Thinking about it a little more, it would be easy enough for folk to see who was friended by the chatham bot, and narrow down at least the sender of the tweet to someone on that list. [If by implication we assume @? is a twitterer, rather than a participant not on twitter, we might further narrow down who said what in this case to someone on Twitter whose Twitter username the person who 'mentioned' them knows.] Chris Gutteridge, who is also not at the event ("@lescarr eh? I didn't know there was a Wednesday bit! #rswebsci any of it streamed?"), suggested "…creat[ing] a rswebscichatham twitter account and tell all people in the room the username/password. #rswebsci" which gets round this problem of preserving anonymity of the sender, which the creation of a Birdherd account (via @jamestoon) might also do?
Okay, enough of that… except to wonder: what other sorts of traffic analysis might we apply to a hashtag twitter stream and a “likely candidates” twitter use analysis over the duration of the event. Would it be easier to preserve the sense of the Chatham House rule if a hashtag was not used?
PS doh! I forgot to raise the point that first came to mind: how would it be possible to remotely attend a Chatham House event via a public backchannel? (Which is where the chathambot anonymiser came in…)
PPS just to note, as the clock ticks on, and the day warms up, other folk who were at #rswebsci on Monday and Tuesday, but who are not at today;s event, are now tweeting again using the hashtag, which means that the channel now has added noise on top of the discussions from today’s satellite event… The easiest way I can think of following today’s events is to create a list of folk known or suspected to be there, and follow that list through an additional #rswebsci filter?
PPPS [via @timdavies] Chatham House rule FAQ covers Twitter as follows:
Can I ‘tweet’ whilst at an event under the Chatham House Rule?
A. The Rule can be used effectively on social media sites such as Twitter as as long as the person tweeting or messaging reports only what was said at an event and does not identify – directly or indirectly – the speaker or another participant. This consideration should always guide the way in which event information is disseminated – online as well as offline.
It also says:
Q. Can a list of attendees at the meeting be published?
A. No – the list of attendees should not be circulated beyond those participating in the meeting.
which can in part be inferred from various uses of Twitter, and maybe also any public geolocation services used by participants. Which is to say, if you know where an event is, you can maybe look for people near there..?
data.open.ac.uk Arrives, With Linked Data Goodness
How time flies in open data land… and how quickly things seem to get done, at least in the opening up stakes. A couple of weeks ago, @ostephens revealed that data.open.ac.uk was live as a URL, and yesterday, a tweet went out from the OU’s Mathieu d’Aquin to say that some data was there…
data.open.ac.uk is the home of open linked data from The Open University. It is a platform currently developed as part of the LUCERO JISC Project to extract, interlink and expose data available in various institutional repositories of the University and make it available openly for reuse.
It seems that there’s going to be a focus on releasing data as Linked Data™, so it’ll be interesting to see, as time goes by, just how many ways we can find of gluing things together, particularly around our crown jewels: OU course codes.
If you can’t wait to get started, the SPARQL endpoint is here: http://data.open.ac.uk/query. Datasets look like they’re organised within particular contexts (whatever that means?!), apparently accessed as follows:
- SELECT ?blah FROM <http://data.open.ac.uk/context/oro> WHERE {…}
- SELECT ?blah FROM <http://data.open.ac.uk/context/podcast> WHERE {…}
If no context is specified, the search presumably runs over everything it can?
So what’s there at the moment? For starters, there’s information from ORO, the OU’s eprints repository, as well as the OU’s iTunesU podcast directory.
I’m not much of a SPARQLer (though I can recommendTalis’ 2 day Introduction to the Web of Data workshop as a way of getting your head round the Linked Data world), but here are a couple of queries I managed to write to just check that the endpoint was working as advertised. I’m using SparqlProxy to run the queries because it offers lots of nice output formats. It also allows you to run queries from a URL, so if the Lucero team wanted to easily share the text of demo queries, they could just bookmark them, one to a page… [See end of post for Howto];-)
The titles of papers authored by surname “Weller”, as listed in ORO:
select distinct ?title from <http://data.open.ac.uk/context/oro> where {
?y a <http://purl.org/ontology/bibo/AcademicArticle>.
?y <http://purl.org/dc/terms/title> ?title.
?y <http://purl.org/dc/terms/creator> ?z.
?z <http://xmlns.com/foaf/0.1/family_name> "Weller" ^^<http://www.w3.org/2001/XMLSchema#string>.
} LIMIT 10
Here are details of a few podcasts on iTunes relating to the course T209:
SELECT distinct ?title ?description WHERE {
?x <http://data.open.ac.uk/podcast/ontology/relatesToCourse> <http://data.open.ac.uk/course/t209>.
?x <http://purl.org/dc/terms/title> ?title.
?x <http://www.w3.org/TR/2010/WD-mediaont-10-20100608/description> ?description } LIMIT 10
The data.open.ac.uk homepage also hints at a few other datasets that may shortly be making an appearance:
- Course information originating from Study at the OU
- The OU Library Catalogue, especially focusing on course material
- Open educational content available in the OpenLearn system
- Public information about staff, locations on the OU campus, etc
From a “transparency means <em<show us where the money is” point of view, it would also be interesting to see a list of funded research projects added to the list, which might in turn provide some sort of incentive to the research councils to start releasing this data in a bit more structured way?;-)
PS just by the by, for openness folks, there’s also a fair amount of stuff on the OU’s FOI site.
PPS here’s a quick/ad hoc way of sharing queries and then running them via SPARQLProxy:
1) save the query to somewhere like Pastebin:
2) Grab a link to the actual text of the query. In Pastebin, “raw” actually displays the clip in HTML tags. For the really raw raw (rarrghhhh… easy, tiger;-)) you’ll need to grab the download URL, e.g. http://pastebin.com/download.php?i=tZzgFDvY
3) Use this URL in a”SPARQL query by URI” query on SPARQLProxy:
IF you create any queries on data.open.ac.uk you’d like to share, why not add them to Pastebin, and share a link back in the comments? :-)
Initial Thoughts on Profiling @dirdigeng’s Friends Network on Twitter
Last week, Andrew Stott, Director of Digital Engagement in the Cabinet Office, announced his retirement date over Twitter:
At the time of writing, @dirdigeng follows slightly over two thousand folk on Twitter, so I thought I’d have a quick look at who the “players” are…
The network described is constructed as follows:
- nodes represent the people followed by @dirdigeng on Twitter;
- a directed edge from A to B means that A is following B.
In the first view (randomly layed out, using Gephi), we plot node size as linearly proportional to the number of dirdigeng’s friends who are following each of the other friends (that is, the in-degree of each node), and colour proportional to their total number of followers (including people not friended by @dirdigeng).
The colour mapping is non-linear – @Number10gov, @guardiantach and @mashable have significantly more followers that the other nodes – and is set via the spline control:
If we run the betweenness centrality statistic, and size nodes accordingly, we can see how the various parts of the network may be connected. (“Betweenness centrality is a measure based on the number of shortest paths between any two nodes that pass through a particular node. Nodes around the edge of the network would typically have a low betweenness centrality. A high betweenness centrality might suggest that the individual is connecting various different parts of the network together.”)
We can also run the modularity class statistic to try to partition the friends into small networks with a high degree of internal connections. Here’s what we get (click through on the image to see it in more detail):
Modularity groups help us understand the structure of the network in a bit more detail. I’ve started to think they might also be used to automatically generate a seeding set of people who form a highly interconnected community with an interest in a particular topic and from a particular stance.
As well as looking at the structure of the network, we can also create a search engine over the home pages declared in the Twitter bios of @dirdigeng’s friends. My thinking here is that this might provide a useful constrained search engine over sites engaged in social media and with an interest in “Digital Britain”.
The simplest custem search engine simply uses the URLs from the Twitter bios of folk followedd by @dirdigeng and adds them to a “Digital Britain” Google Custom search engine. However, one attractive feature of the Google CSEs is that you can also tweak the rankings by weighting results from different domains differently to give a “weighted” custom search engine.
As a quick experiment, I produced one weighted search engine where I set the score for each domain to be the normalised number of followers amongst @dirdigeng’s friends community. (That is, the domain score equalled the indegree of a node in the @dirdigEng friends network, divided by the total number of people in that network).
As you can see from the above, the results differ… Whether there is any improvement in the ranking of results is another thing. (There is also the question of how best to score, or boost, rankings based on networks stastics, and the extent to which rankings should be determined by friends network factors…)
It also strikes me that the modularity groups might also be used to inform the setup of a CSE. For example, separate modularity groups/classes may be used to define refinement label, allowing users to just search pages from members of a particular modularity class, or boost the results from those people.
And finally, I wonder whether we can mine the tweets of @dirdigeng’s friends, as well as those of @dirdigeng, to provide raw material for additional advice for searchers?
So When Will We Start to See Live BIllboard Ads in Streetview?
Via this Google Maps Mania post, I see that a couple of views of the same spot in Google Streetview show slightly different things?
Here are the URLs, just in case you;re wondering, with the differences marked…
http://maps.google.ca/maps?f=q&source=embed&hl=en&geocode=&q=3800+ontario+est,montreal&sll=49.891235,-97.15369&sspn=41.711424,114.169922&ie=UTF8&hq=&hnear=3800+Rue+Ontario+Est,+Montr%C3%A9al,+Communaut%C3%A9-Urbaine-de-Montr%C3%A9al,+Qu%C3%A9bec+H1W+1S4&layer=c&cbll=45.547142,-73.543563&panoid=kcGq_bAKrDDfp-xkyDYuIQ&cbp=12,117.63,,0,8.37&ll=45.547116,-73.559818&spn=0.027831,0.087891&z=14
http://maps.google.ca/maps?f=q&source=embed&hl=en&geocode=&q=3800+ontario+est,montreal&sll=49.891235,-97.15369&sspn=41.711424,114.169922&ie=UTF8&hq=&hnear=3800+Rue+Ontario+Est,+Montr%C3%A9al,+Communaut%C3%A9-Urbaine-de-Montr%C3%A9al,+Qu%C3%A9bec+H1W+1S4&layer=c&cbll=45.547097,-73.543415&panoid=XHVjf9ort_xyg8aMWVUuQQ&cbp=12,97.69,,0,2.18&ll=45.547142,-73.543563&spn=0.027831,0.087891&z=14
This reminded me of the patent announced earlier this year relating to the placement of live ads on billboards within in Google Streetview (Google Plans to Upgrade Old Billboards in Street View), something that already happens in many console based computer games (e.g. using the services of companies such as Massive Incorporated).
(By the by, Google already offers an in-game advertising product for web based games…)
A little while ago, Google mailed out window stickers to shops that included a QR code, so that people with smart phones could easily pop up a web page for that store….
So what happens next time the StreetView car passes by? Could the Goog detect the QR code from the Streetview image and mesh things together just a little bit more? After all, if in-game advertising is as effective as it seems to be, “subliminal” advertising in StreetView is probably also worth a punt.
With AR tools ten a penny now, it should also be easy enough to detect things like in-window QR codes, and use these to key adverts on nearby “live billboards”?
PS and finally: who knew that a shopping app could act as a vector for getting supermarket advertising into schools? Survive an earthquake with Tesco Finder
Open Courses: About 10 Weeks Seems To Be It, Then?
Consternation on the twittertubes this morning about Wolverhampton’s i-CD: Intelligent Career Development, which seeks to offer “a completely new approach to higher education”:
Historically, people have either gone to university or, more recently, universities have tried to come to them. That is to say, they have opened themselves to part-time students in the evenings or projected learning materials via distance learning or tailored their programmes to employers’ needs. However, they have never previously attempted to do all these things in a single programme. Via i-CD, the University of Wolverhampton is for the first time providing low-cost, flexibly-delivered, workplace-based, market-driven, fully-accredited, higher education.
(Err… I think the OU does that actually, through work based learning, an increasing number of vendor qualifications from Cisco and Microsoft that also provide academic credit, sector based courses and qualifications, and so on… All part-time, at a distance, with support (and online community), and some of them in the workplace too.)
So for example, I think @dkernohan sees parts of his nightmare vision coming true?
What struck me is that the Wolverhampton offering is being built around 10 week courses, the same length as the OU short courses (which in the OU case result in 10 CAT points of academic credit, corresponding to a nominal 10 hours study a week).
Also coming in at 10 weeks is the currently running PLENK2010 Massive Online Open Course (hmm.. does that URL scale for other courses?), and close behind, at 12 weeks, the forthcoming openEd 2.0 course on “Business and management competencies in a Web 2.0 world”:
a FREE/OPEN course targeting business students and practitioners alike. The course consists of two strands: an academic and a professional practice based strand, though both strands can be taken together. Furthermore, the openEd 2.0 course is MODULAR, thus learners can also “pick” the individual modules they are interested at.
Whilst I’m encouraged to see the rise of open courses (and there’s an increasing number of them: for example, P2PU are currently running a course on Open Journalism on the Open Web, I do think the OU is maybe missing a trick, and not leading the way in terms of innovating around open online courses…
…becuase the OU has being doing online education for years. Our first fully online course (T171, as authored by Martin Weller and John Naughton, amongst others) first presented in 1999 (I think), with thousands of students per presentation. The current Royal Photographic Society (RPS) recognised short course on Digital Photography regularly pulls in large numbers of students (in the OU, courses with less than 250 students are small…) and the new CompTIA approved Linux course is already a middle sized course… (Notice anything about those courses…? Recognition from outside academia too…)
So why isn’t the OU experimenting with running massive open online courses, with an option to “upsell” accreditation to students who want the formal academic credit? Maybe providing the support typically offered to students taking OU courses wouldn’t be cost-effective in an open course, although the wholly online short courses at least have already foregone personal tutor support. Expecting forum moderators to act as sales reps for accreditation is maybe not the sort of support we’d like to see being offered…?!
I’ve mentioned before that open educational resources might benefit from being created in public, possibly in an open course setting… SO maybe the time is now right to start trialing open courses (uncourses?;-), maybe informed by requests from (potential) students about the courses they’d like to see, creating the materials in near real-time (and drawing on other open resources, “educational” and otherwise) for the open presentation, then providing students who want to gain formal credit with some sort of assessment and accreditation?
How might this formal recognition be achieved?
- possibly via a semi-formal OU certificate that can be formally recognised through a credit transfer route?
- maybe using variant of the Career development and employability course container that lets students “use [their] workplace as a context for learning, and develop [their] ability to apply [their] learning to improve [their] practice at work”)?
- or how about the Make your experience count course container, which “gives you the opportunity to gain 30 credit points towards higher education qualifications by drawing on your past learning experiences”?
With a little bit of wit and imagination, I’m sure we could wither finesse one of our current “prior experience” courses to support the award of credit to open online courses, or come up with a new 10 point container: Open Education Course Credit
PS Hmmm, as an experiment, I wonder what would happen if someone who had taken an open online course tried to get it accepted “in partial fulfilment” of one of the accreditation of prior experience containers mentioned above? If you try it, let me know how you get on…;-)
ffmpeg – Handy Hints
Since starting to play with the Onyx VJ toy over the weekend, I’ve spent more time than I should trying to find various ways of mapping between various movie file formats… I’ve also started looking for movie clips to use as the basis for animations within Powerpoint, as if my presentations weren’t already confusing enough;-)
Anyway, ffmpeg seems to do the job brilliantly… because it’s command line driven, and I just *know* I’ll forget the command if I don’t collate them somewhere, I’m going to use this post to aggregate useful conversions.
If you know of any really powerful ones I’m missing, please let me know via a comment…
- Convert a swf flash file to avi: ffmpeg -i file.swf file.avi
- Convert an AVI file to swf (eg for use in Onyx) and strip out the audio (-an): ffmpeg -i file.avi -an file.swf
PLENK2010 – Twitter Clusters
Playing around with looking at the structure of my own Twitter friends network (see recent previous posts) by using the Gephi modularity statistic to partition (or cluster) my Twitter network depending on the strengths of connections between members of that network, it struck me that I could take a similar approach to exploring the structure of the relations between the members of a Twitter list. So I grabbed the members of the PLENK2010 list (which I had automatically created by mining the Twapperkeeper archive of posts tagged with PLENK2010, and then adding frequent hashtaggers to the list), grabbed all their friends lists, and had a poke around the friends connections between the list members.
The Gephi modularity tool identified three medium sized clusters, one large cluster, and several smaller ones. Looking at the three middle sized clusters, let’s see who’s in each cluster, where they’re from (from their Twitter location info) and what their interests are (from their Twitter bio field).
Here’s the first cluster:
Here’s the second cluster:
And here’s the third:
Not surprisingly, it seems as if geography still plays a role in defining networks…
There was also a large cluster identified in the original pass:
Here’s what they’re interested in:
And here’s where they’re from:
Here’s what happens if we partition that large cluster by running the modularity tool over just the members of this cluster again:
Do they make any sort of sense…?
So is this:
a) interesting?
b) useful?
If it’s useful – why? What can we do with this information?










































