Category: Stirring

Fragmentary Observations from the Outside About How FutureLearn’s Developing

I’m outside the loop on all matters FutureLearn related, so I’m interested to see what I can pick up from fragments that do make it onto the web.

So for example, from a presentation by Hugh Davis to the M25 Libraries conference April 2013 about Southampton’s involvement with FutureLearn, Collaboration, MOOCs and Futurelearn, we can learn a little bit about the FutureLearn pitch to partners:

FutureLEarn Overview

More interesting, I think, is this description of what some of the FutureLearn MOOCs might look like:

MOOC Structure

“miniMOOCs” containing 2 to 3 learning units, each 2-6 hours of study time, broken into 2-3 self-contained learning blocks (which suggests 1-2 hours per block).

So I wonder, based on the learning block sequence diagram, and the following learning design elements slide:

learning design

Will the platform be encouraging a learning design approach, with typed sequences of blocks that offer templated guides as to how to structure that sort of design element? Or is that way off the mark. (Given the platform is currently being built, (using Go Free Range for at least some of the development, I believe), it’s tricky to see how this is being played out, given courses and platform both need to ready at the same time, and it’s hard to write courses using platform primitives if the platform isn’t ready yet?)

Looking elsewhere (or at least, via @patlockley), we may be able to get a few more clues about the line partners are taking towards FutureLearn course development:

futurelearn job ad - LEeds

Hmm, I wonder – would it be worth subscribing to jobs feeds from the partner universities over the next few months to see whether any other FutureLearn related posts are being opened up? And does this also provide an opportunity for the currently rather sparse FutureLearn website to start promoting those jobs ads? And come to that, how come the jobs that have been appointed at FutureLearn weren’t advertised on the FutureLearn website…?

Because jobs have been appointed, as LinkedIn suggests… Here’s who’s declaring an association with the company at the moment:

futurelearn on linkedIN

We can also do a slightly broader search:

futurelearn search

There’s also a recently closed job ad with a role that doesn’t yet appear on anyone’s byline:

global digital marketing sstrategist

So what roles have been filled according to this source?

  • CEO
  • Head of Content
  • Head of UK Education & HE Partnerships
  • CTO
  • Senior Project Manager / Scrum Master (Contract)
  • Agile Digital Project Manager
  • Product manager
  • Marketing and Communications Assistant
  • Interim HR Consultant
  • Learning Technologist
  • Commercial and Operations Director for Launch
  • Global Digital Marketing Strategist

Here’s another one, Academic Lead [src].

By the by, I also notice that the OU VC, Martin Bean, has just been appointed as a director of FutureLearn Ltd.

Exciting times, eh…?!;-)

Related: OU Launches FutureLearn Ltd

PS v loosely related (?!) – (Draft) Coursera data export policy

PPS I also noticed this the other day – OpenupEd (press release) an EADTU co-ordinated portal that looks like a clearing house for OER powered MOOCs from universities across the EU (particularly open universities, including, I think, The OU…;-)

Moving Machines…

I’ve just taken on a new desktop computer – the first desktop machine I’ll have used as daily machine for seven or eight years. As with every new toy, there is the danger of immediately filling it with the same crud that I’ve got on my current laptop, but I’m going to try to limit myself to installing things that I actually use…

My initial download list (the computer is a Mac):

  • A lot of files I work with are on Google docs, so I don’t actually need to install them at all – I just need a browser to access them
  • an alternative browser: Macs come with Safari preinstalled but I tend to use Chrome; I don’t sign in to Chrome, although I do use it on several machines. Being able to synch bookmarks would be handy, but I’m not sure I want to inflict the scores of open tabs I have onto every browser I open…
  • Dropbox desktop: I need to rethink my Dropbox strategy, and indeed the way I organise files, but Dropbox on the desktop is really handy…having downloaded and configured the client, it started synching my Dropbox files by itself (of course…;-). I’ll probably add the Google Drive dektop client at some point too, but in that case I definitely need a better file management strategy…
  • Gephi: for playing with network visualisations, and one of the main reasons for getting the new machine. As Gephi is a Jave app, I also needed to download a Java runtime in order to be able to run it
  • Rstudio: I considered not bothering with this, pondering whether I could move wholesale to the hosted RStudio at crunch.kmi.open.ac.uk, but then went with the desktop version for several reasons: a) I tinker with RStudio all the time, and don’t necessarily want to share everything on Crunch (not because users can see each others’ files even if they aren’t public, rather: there’s the risk Crunch may disappear/become unavailable/I might be cast out of the OU etc etc); b) the desktop version plays nicely with git/github…
  • Git and Git for Mac: I originally downloaded Git for Mac, a rather handy UI client, thinking it would pull down a version of Git for the commandline that RStudio could play with. It didn’t seeem to, so I pulled a git installer down too;
  • Having got Git in place, I cloned one project I’m currently working on from Github using RStudio, and another using Git for Mac; the RStudio project had quite a few package dependencies (ggplot2, twitteR, igraph, googleVis, knitr) so I installed them by hand. I really need to refactor my R code so that it installs any required packages if they haven’t already been installed.
  • One of the things I pulled from Github is a Python project; it has a few dependencies (simplejson (which I need to update away from?), tweepy, networkx, YQL), so I grabbed them too (using easy_install).
  • For my Python scribbles, I needed a text editor. I use TextWrangler on my laptop, and saw no reason to move away from it, so I grabbed that too. (I really need to become a more powerful user of TextWrangler – I don’t really know how to make proper use of it at all…)
  • Another reason for the big screen/bigger machine was to start working with SVG files – so I grabbed a copy of Inkscape and had a quick play with it. It’s been a long time since I used a mouse, and the Mac magic mouse seems to have a mind of its own (I far prefer two-finger click to RSI inducing right-click but haven’t worked out how/if magic mouse supports that?) but I’ve slowly started to find my way round it. Trying to import .eps files, I also found I needed to download and install Ghostscript (which required a little digging around until I found someone who’d built a Mac package/installer…)
  • I am reluctant to install a Twitter client – I think I shall keep the laptop open and running social tools so as not to distract myself by social conversation tools on the other machine…
  • I guess I’ll need to install a VPN client when I need to login to the OU VPN network…
  • I had a brief go at wiring up Mac mail and iCal to the OU’s Outlook client using a Faculty cribsheet, but after a couple of attempts I couldn’t get it to take so guess I’ll just stick with the Outlook Web App.

PS One of the reasons for grabbing this current snapshot of my daily tools is because the OU IT powers that be are currently looking at installing OU standard desktops that are intended to largely limit the installation of software to software from an approved list (and presumably offer downloads from an approved repository). I can see this has advantages for management, (and might also have simplified my migration?) but it is also highly restrictive. One of the problems with instituting too much process is that folk find workarounds (like acquiring admin passwords, rather than being given their own admin/root accounts from the outset) or resetting machines to factory defaults to get around pre-installed admin bottlenecks. I appreciate this may go against the Computing Code of Conduct, but I rarely connect my machines directly to the OU network, instead favouring eduroam when on campus (better port access!) and using VPN if I ever need access to OU network services. Software is the stuff that allows computers to take on the form of an infinite number of tools – the IT stance seems to take the view that it’s a limited purpose tool and they’re the ones who set the limits. Which makes me wonder: maybe this is just another front on the “Coming Civil War over General-purpose Computing”…?

Enter the Market – Course Data

I’m not at Dev8Ed this week, though I probably should be, but here’s what I’d have probably tinkered with had I gone – a recipe for creating a class of XCRI (course marketing data) powered websites to support course choice on a variety of themes and that could be used to ruthlessly and shamelessly exploit any and every opportunity for segmenting audiences and fragmenting different parts of the market for highly targeted marketing campaigns. So for example:

  • let’s start with something easy and obvious: russelgroupunis.com (sic;-), maybe? Search for courses from Russell Group (research intensive) universities on a conservatively branded site, lots of links to research inspired resources, pre-emptively posted reading lists (with Amazon affiliate codes attached); then bring in a little competition, and set this site up as a Waitrose to the Sainsburys of 1994andallthat.com, a course choice site based around the 1994 Group Universities (hmmm: seems like some of the 1994 Group members are deserting and heading off to join the Russell Group?); worthamillionplus.com takes the Tesco ads for the Million+ group, maybe, and unireliance.com (University Alliance) the Morrisons(?) traffic. (I have no idea if these uni group-supermarket mappings work? What would similarly tongue-in-cheek broadsheet/tabloid mappings be I wonder?!). If creative arts are more your thing, there could be artswayforward.com for the UKIAD folk, perhaps?
  • there are other ways of segmenting the market, of course. University groupings organise universities from the inside, looking out, but how about groupings based on consumers looking in? At fiveAgrades.com, you know where the barrier is set, as you do with 9kQuality.com, whereas cheapestunifees.com could be good for bottom of the market SEO. wetakeanyone.com could help at clearing time (courses could be identified by looking at grade mappings in course data feeds), as could the slightly more upmarket universityclearingcourses.com. And so on
  • National Student Survey data could also play a part in automatically partitioning universities into different verticals, maybe in support of FTSE-30 like regimes where only courses from universities in the top 30 according to some ranking scheme or other are included. NSS data could also power rankings of course. (Hmm… did I start to explore this for Course Detective? I don’t remember…Hmmm…)

The intention would be to find a way of aggregating course data from different universities onto a common platform, and then to explore ways of generating a range of sites, with different branding, and targeted at different markets, using different views over the same aggregated data set but similar mechanics to drive the sites.

PS For a little inspiration about building course comparison websites based around XCRI data, NSS data and KIS data, it may be worth looking at how the NHS does it (another UK institution that’s hurtling towards privatisation…): for example, check out NHS Choices hospitals near you service, or alternatively compare GPs.

PPS If anyone did start to build out a rash of different course comparison sites on a commercial basis, you can bet that as well as seeking affiliate fees for things like lead generation (prospectuses downloaded/mailed, open day visits booked (in exchange for some sort of ‘discount’ to the potential student if they actually turn up to the open day), registrations/course applications made etc) advertising would play a major role in generating site revenue. If a single operator was running a suite of course choice sites, it would make sense for them to look at how cross-site exploitation of user data could be used to track users across sites and tune offerings for them. I suspect we’d also see the use of paid placement on some sites (putting results to the top of a search results listing based on payment rather than a more quality driven ranking algorithm), recreating some of the confusion of the early days of web searchengines.

I suspect there’d also be the opportunity for points-make-prizes competitions, and other giveaways…

Or like this maybe?

Ahem…

[Disclaimer: the opinions posted herein are, of course, barely even my own, let alone those of my employer.]

Cognitive Waste and the Project Funding Bind

As I tweeted earlier today: “A problem with project funding is that you’re expected to know what you’re going to do in advance – rather than discover what can be done..”

This was prompted by reading a JISC ITT (Invitation to Tender) around coursedata: Making the most of course information – xcri-cap feed use demonstrators. Here’s an excerpt from the final call:

JISC is seeking to fund around 6-10 small, rapid innovation projects to create innovative, engaging examples that demonstrate the use of the #coursedata xcri-cap feeds (either directly, or via the JISC Aggregator API). These innovative examples will be shared openly through the JISC web site and events to promote the good practice that has been adopted.
13. The demonstrators could use additional data sources such as geolocation data to provide a mash-up, or may focus on using a single institutional feed to meet a specific need.
14. The demonstrators should include a clear and compelling use case and usage scenario.
15. The range of demonstrators commissioned will cover a number of different approaches and is likely to include examples of:
• an online prospectus, such as a specialist courses directory;
• a mobile app, such as a course finder for a specific geographical area;
• a VLE block or module, such as a moodle block that identifies additional learning opportunities offered by the host institution;
• an information dashboard, such as a course statistics dashboard for managers providing an analysis of the courses your institution offers mashed up with search trends from the institutional website;
• a lightweight service or interface, such as an online study group that finds peers based on course description;
• a widget for a common platform, such as a Google Gadget that identifies online courses, and pushes updates to the users iGoogle page.
16. All demonstrators should be working code and must be available under an open source licence or reusable with full documentation. Project deliverables can build on proprietary components but wherever possible the final deliverables should be open source. If possible, a community-based approach to working with open source code should be taken rather than just making the final deliverables available under an open source licence.
17. The demonstrators should be rapidly developed and be ready to use within 4 months. It is expected most projects would not require more than 30 – 40 chargeable person days.

In addition:

23. Funding will not be allocated to allow a simple continuation of an existing project or activity. The end deliverable must address a specific need that is accepted by the community for which it is intended and produce deliverables within the duration of the project funding.
24. There should be no expectation that future funding will be available to these projects. The grants allocated under this call are allocated on a finite basis. Ideally, the end deliverables should be sustainable in their own right as a result of providing a useful solution into a community of practice.

The call appears to be open to all comers (for example, sole traders) and represents a way of spending money on bootstrapping innovation around course data feeds using HEFCE funding, in a similar way to how the Technology Strategy money disburses money (more understandably?) to commercial enterprises, SMEs, and so on. (Although JISC isn’t a legal entity – yet – maybe we’ll start to see JISC trying to find ways in which it can start to act as a vehicle that generates returns from which it can benefit financially, eg as a venture funder, or as a generator of demonstrable financial growth?)

As with many JISC calls, the intention is that something “sustainable” will result:

22. Without formal service level agreements, dependency on third party systems can limit the shelf life of deliverables. For these types of projects, long term sustainability although always desirable, is not an expected outcome. However making the project deliverables available for at least one year after the end of the project is essential so opportunities are realised and lessons can be learned.

24. There should be no expectation that future funding will be available to these projects. The grants allocated under this call are allocated on a finite basis. Ideally, the end deliverables should be sustainable in their own right as a result of providing a useful solution into a community of practice.

All well and good. Having spent a shedload (technical term ;-) on getting institutions to open up their course data, the funders now need some uptake. (That there aren’t more apps around course data to date is partly my fault. The TSO Open Up Competition prize I won secured a certain amount of TSO resource to build something around course scaffolding code scaffolding data as held by UCAS (my proposal was more to do with seeing this data opened up as enabling data, rather than actually pitching a specific application…). As it turned out, UCAS (a charity operated by the HEIs, I think) were (still are?) too precious over the data to release it as open data for unspecified uses, so the prize went nowhere… Instead, HEFCE spent millions through JISC to get universities to open up course data (albeit probably more comprehensive than the UCAS data) instead…and now there’s an unspecified amount for startups and businesses to build services around the XCRI data. (Note to self: are UCAS using XCRI as an import format or not? If not, is HEFCE/JISC also paying the HEIs to maintain/develop systems that publish XCRI data as well as systems that publish data in an alternative way to UCAS?)

I think TSO actually did some work aggregating datasets around a, erm, model of the UCAS course data; so if they want a return on that work, they could probably pitch an idea for something they’ve already prepped and try to gt HEFCE to pay for it, 9 months on from when I was talking to them at their expense…

Which brings me in part back to my tweet earlier today (“A problem with project funding is that you’re expected to know what you’re going to do in advance – rather than discover what can be done..”), as well as the mantra I was taught way back when I was a research student, that the route to successful research bids was to bid to do work you had already done (in part because then you could propose to deliver what you knew you could already deliver, or could clearly see how to deliver…)

This is fine if you know what you’re pitching to do (essentially, doing something you know how to do), as opposed to setting out to discover what sorts of things might be possible if you set about playing with them. Funders don’t like the play of course, because it smacks of frivolity and undirectedness, even though it may be a deeply focussed and highly goal directed activity, albeit one where the goal emerges during the course of the activity rather than being specified in advance.

As it is, funders tend to fund projects. They tell bidders what they want, bidders tell funders back how they’ll do it (either something they’ve already done = guaranteed deliverable, paid for post hoc), or something they *think* they intend to do (couched in project management and risk assessment speak to mask the fact they don’t really know what’ll happen when they try to execute the plan, but that doesn’t really matter, because at the end of the day they have a plan and a set of deliverables against which they can measure (lack of) progress.) In the play world, you generally do or deliver something because that’s the point – you are deeply engaged in and highly focussed on whatever it is that you’re doing (you are typically intrinsically motivated and maybe also extrinsically motivated by whatever constraints or goals you have adopted as defining the play context/play world. During play, you work hard to play well. And then there’s the project world. In the project world, you deliver or you don’t. So what.

Projects also have overheads associated with them. From preparing, issuing, marking, awarding, tracking and reporting on proposals and funded projects on the fundrs’ side, to preparing, submitting, and managing the project on the other (aside from actually doing the project work – or at last, writing up what has previously been done in an appropriate way;-).

And then there’s the waste.

Clay Shirky popularised the notion of cognitive surplus to characterise creative (and often collaborative creative) acts done in folks’ free time. Things like Wikipedia. I’d characterise this use of cognitive surplus capacity as a form of play – in part because it’s intrinsically motivated, but also because it is typically based around creative acts.

But what about cognitive waste, such as arises from time spent putting together project proposals that are unfunded and then thrown away (why aren’t these bids, along with the successful ones, made open as a matter of course, particularly when the application is for public money from an applicant funded by public money?). (Or the cognitive waste associated with maintaining a regular blog… erm… oops…)

I’ve seen bids containing literature reviews that rival anything in the (for fee, paywall protected, subscription required, author/institution copyright waivered) academic press, as well as proposals that could be taken up, maybe in partnership, by SMEs for useful purpose, rather than academic partners for conference papers), to time spent pursuing project processes, milestones and deliverables for the sole reason that they are in the plan that was defined before the space the project was pitched in to is properly through engaging with it, rather than because they continue to make sense (if indeed they ever did). (And yes, I know that the unenlightened project manager who sees more merit in trying to stick to the project plan and original deliverables, rather than pivoting if a far more productive, valuable or useful opportunity reveals itself, is a mythical beast…;-).

Maybe the waste is important. Evolution is by definition wasteful process, and maybe the route to quality is through a similar sort of process. Maybe the time, thought and effort that goes into unsuccessful bids really is cognitive waste, bad ideas that don’t deserve to be shared (and more than that, shouldn’t be shared because they are dangerously wrong). But then, I’m not sure how that fits in with project funding schemes that are over-subscribed and even highly rated proposals (that would ordinarily receive funding) are rejected, whereas in an undersubscribed call (maybe because it is mis-positioned or even irrelevant), weak bids (that ordinarily wouldn’t be considered) get funding.

Or maybe cognitive waste arises from a broken system and broken processes, and really is something valuable that is being wasted in the sense of squandered?

Right – rant over, (no) (late)lunchtime over… back to the “work” thing, I guess…

PS via @raycorrigan: “Newton, Galileo, Maxwell, Faraday, Einstein, Bohr, to name but a few; evidence of paradigm shifting power of ‘cognitive waste'” – which is another sense of “waste” I hadn’t considered, which is waste (as in loss, or loss to an organisation) of good ideas through rejecting or not supporting the development of a particular proposal or idea..?

Mapping the Tesco Corporate Organisational Sprawl – An Initial Sketch

A quick sketch, prompted by Tesco Graph Hunting on OpenCorporates of how some of Tesco’s various corporate holdings are related based on director appointments and terminations:

The recipe is as follows:

– grab a list of companies that may be associated with “Tesco” by querying the OpenCorporates reconciliation API for tesco
– grab the filings for each of those companies
– trawl through the filings looking for director appointments or terminations
– store a row for each directorial appointment or termination including the company name and the director.

You can find the scraper here: Tesco Sprawl Grapher

import scraperwiki, simplejson,urllib

import networkx as nx

#Keep the API key [private - via http://blog.scraperwiki.com/2011/10/19/tweeting-the-drilling/
import os, cgi
try:
    qsenv = dict(cgi.parse_qsl(os.getenv("QUERY_STRING")))
    ockey=qsenv["OCKEY"]
except:
    ockey=''

rurl='http://opencorporates.com/reconcile/gb?query=tesco'
#note - the opencorporates api also offers a search:  companies/search
entities=simplejson.load(urllib.urlopen(rurl))

def getOCcompanyData(ocid):
    ocurl='http://api.opencorporates.com'+ocid+'/data'+'?api_token='+ockey
    ocdata=simplejson.load(urllib.urlopen(ocurl))
    return ocdata

#need to find a way of playing nice with the api, and not keep retrawling

def getOCfilingData(ocid):
    ocurl='http://api.opencorporates.com'+ocid+'/filings'+'?per_page=100&api_token='+ockey
    tmpdata=simplejson.load(urllib.urlopen(ocurl))
    ocdata=tmpdata['filings']
    print 'filings',ocid
    #print 'filings',ocid,ocdata
    #print 'filings 2',tmpdata
    while tmpdata['page']<tmpdata['total_pages']:
        page=str(tmpdata['page']+1)
        print '...another page',page,str(tmpdata["total_pages"]),str(tmpdata['page'])
        ocurl='http://api.opencorporates.com'+ocid+'/filings'+'?page='+page+'&per_page=100&api_token='+ockey
        tmpdata=simplejson.load(urllib.urlopen(ocurl))
        ocdata=ocdata+tmpdata['filings']
    return ocdata

def recordDirectorChange(ocname,ocid,ffiling,director):
    ddata={}
    ddata['ocname']=ocname
    ddata['ocid']=ocid
    ddata['fdesc']=ffiling["description"]
    ddata['fdirector']=director
    ddata['fdate']=ffiling["date"]
    ddata['fid']=ffiling["id"]
    ddata['ftyp']=ffiling["filing_type"]
    ddata['fcode']=ffiling["filing_code"]
    print 'ddata',ddata
    scraperwiki.sqlite.save(unique_keys=['fid'], table_name='directors', data=ddata)

def logDirectors(ocname,ocid,filings):
    print 'director filings',filings
    for filing in filings:
        if filing["filing"]["filing_type"]=="Appointment of director" or filing["filing"]["filing_code"]=="AP01":
            desc=filing["filing"]["description"]
            director=desc.replace('DIRECTOR APPOINTED ','')
            recordDirectorChange(ocname,ocid,filing['filing'],director)
        elif filing["filing"]["filing_type"]=="Termination of appointment of director" or filing["filing"]["filing_code"]=="TM01":
            desc=filing["filing"]["description"]
            director=desc.replace('APPOINTMENT TERMINATED, DIRECTOR ','')
            director=director.replace('APPOINTMENT TERMINATED, ','')
            recordDirectorChange(ocname,ocid,filing['filing'],director)

for entity in entities['result']:
    ocid=entity['id']
    ocname=entity['name']
    filings=getOCfilingData(ocid)
    logDirectors(ocname,ocid,filings)

The next step is to graph the result. I used a Scraperwiki view (Tesco sprawl demo graph) to generate a bipartite network connecting directors (either appointed or terminated) with companies and then published the result as a GEXF file that can be loaded directly into Gephi.

import scraperwiki
import urllib
import networkx as nx

import networkx.readwrite.gexf as gf

from xml.etree.cElementTree import tostring

scraperwiki.sqlite.attach( 'tesco_sprawl_grapher')
q = '* FROM "directors"'
data = scraperwiki.sqlite.select(q)

DG=nx.DiGraph()

directors=[]
companies=[]
for row in data:
    if row['fdirector'] not in directors:
        directors.append(row['fdirector'])
        DG.add_node(directors.index(row['fdirector']),label=row['fdirector'],name=row['fdirector'])
    if row['ocname'] not in companies:
        companies.append(row['ocname'])
        DG.add_node(row['ocid'],label=row['ocname'],name=row['ocname'])   
    DG.add_edge(directors.index(row['fdirector']),row['ocid'])

scraperwiki.utils.httpresponseheader("Content-Type", "text/xml")


writer=gf.GEXFWriter(encoding='utf-8',prettyprint=True,version='1.1draft')
writer.add_graph(DG)

print tostring(writer.xml)

Saving the output of the view as a gexf file means it can be loaded directly in to Gephi. (It would be handy if Gephi could load files in from a URL, methinks?) A version of the graph, laid out using a force directed layout, with nodes coloured according to modularity grouping, suggests some clustering of the companies. Note the parts of the whole graph are disconnected.

In the fragment below, we see Tesco Property Nominees are only losley linked to each other, and from the previous graphic, we see that Tesco Underwriting doesn’t share any recent director moves with any other companies that I trawled. (That said, the scraper did hit the OpenCorporates API limiter, so there may well be missing edges/data…)

And what is it with accountants naming companies after colours?! (It reminds me of sys admins naming servers after distilleries and Lord of the Rings characters!) Is there any sense in there, or is arbitrary?

Tesco Graph Hunting on OpenCorporates

A quick lunchtime post on some thoughts around constructing corporate graphs around OpenCorporates data. To ground it, consider a search for “tesco” run on gb registered companies via the OpenCorporates reconciliation API.

{"result":[{"id":"/companies/gb/00445790", "name":"TESCO PLC", "type":[{"id":"/organization/organization","name":"Organization"}], "score":78.0, "match":false, "uri":"http://opencorporates.com/companies/gb/00445790"}, {"id":"/companies/gb/05888959", "name":"TESCO AQUA (FINCO1) LIMITED", "type":[{"id":"/organization/organization", "name":"Organization"}], "score":71.0, "match":false, "uri":"http://opencorporates.com/companies/gb/05888959"}, { ...

Some or all of these companies may or may not be part of the same corporate group. (That is, there may be companies in that list with Tesco in the name that are not part of the group of companies associated with a major UK supermarket.)

If we treat the companies returned in that list as one class of nodes in a graph, we can start to construct a range of graphs that demonstrate linkage between companies based on a variety of factors. For example, a matching address for a registered, post off box mediated, address in an offshore tax haven suggests there may be a weak tie at least between companies:

(Alternatively, we might construct bipartite graphs containing company nodes and address nodes, for example, then collapse the graph about common addresses.)

Shared directors would be another source of linkage, although at the moment, I don’t think OpenCorporates publishes directors associated with UK companies (I suspect that data is still commercially licensed?). However, there is associated information available in the OpenCorporates database already…. For example, if we look at the various company filings, we can pick up records relating to director appointments and terminations?

By monitoring filings, we can then start to build up a record of directorial involvement with companies? From looking at the filings, it also suggests that it would make sense to record commencement and cessation dates for directorial appointments…

There may also be weak secondary evidence linking companies. For example, two companies that file trademarks using the same agent have a weak tie through that agent. (Of course, that agent may be acting for two completely independent companies.)

If we weight edges between nodes according to the perceived strength of a tie and then lay out the graph in a way that is sensitive to the number of weight of edge connections between company nodes, we may be able to start mapping out the corporate structure of these large, distributed corporations, either in network map terms, or maybe by mapping geolocated nodes based on registered addresses; and then we can start asking questions about why these distributed corporate entities are structured the way they are…

PS note to self – OpenCorporates API limit with key: 1000/hr, 10k/day

Autodiscoverable Feeds and UK HEIs (Again…)

It’s that time of year again when Brian’s banging on about IWMW, the Instituional[ised?] Web Managers’ Workshop, and hence that time of year again when he reminds me* about my UK HE Feed Autodiscovery app that trawls through various UK HEI home pages (the ones on .ac.uk, rather than the one you get by searching for a uni name in Google;-)

* that is, tells me the script is broken and, by implication, gently suggests that I should fix it…;-)

As ever, most universities don’t seem to be supporting autodiscoverable feeds (neither are many councils…), so here are a few thoughts about what feeds you might link to, and why…

news feeds: the canonical example. News feeds can be used to pipe news around various university websites, and also syndicate content to any local press or hyperlocal news sites. If every UK HEI published a news feed that was autodiscoverable as such, it would be trivial to set up a UK universities aggregated newswire.

research announcements: I was told that one reason for putting out press releases was simply to build up an institutional memory/archive of notable events. Many universities run research newsletters that remark on awarded grants. How about a “funded research” feed from each university detailing grant awards and other research funding. Again, at a national level, this could be aggregated to provide a research funding newswire, as well as contribtuing data to local archives of research funding success.

jobs: if every UK HEI published a jobs/vacancies RSS feed, it would trivial to build an aggregator and let people roll their own versions of jobs.ac.uk.

events: universities contribute a lot to local culture through public talks and exhibitions. Make it easy for the local press and hyperlocal news sites to syndicate this info, and add events to their own aggregated “what’s on” calendars. (And as well as RSS, give ’em an iCal feed for your events.)

recent submissions to local repository: provide a feed listing recent submissions to the local research output/paper repository (and/or maybe a feed of the most popular downloads); if local feeds are you thing, the library quite possibly makes things like recent acquisition feeds available…

YouTube uploads: you might was well add an autodiscoverable feed to your university’s recent uploads on YouTube. If nothing else, it contributes an informal ownership link to the web for folk who care about things like that.

your university Twitter feed: if you’ve got one. I noticed Glasgow Caledonian linked to their Twitter feed through an autodiscoverable link on their university homepage.

tenders: there’s a whole load of work going on in gov at the moment regarding transparency as it relates to procurement and tendering. So why not get open with your procurement and tendering data, and increase the chances of SMEs finding out what you’re tendering around. If the applications have to go through a particular process, no problem: link to the appropriate landing page in each feed item.

energy data: releasing this data may well become a requirement in the not so far off future, so why not get ahead of the game, e.g. as Lincoln are starting to do (Lincoln U energy data)? If everyone was publishing energy data feeds, I’m sure DevCSI hackday folk would quickly roll together something like the aggregating service built by college student @issyl0 out of a Rewired State hack that pulls together UK gov department energy data: GovSpark

XCRI-CAP course marketing data feeds: JISC is giving away shed loads of cash to support this, so pull your finger out and get the thing published.

location data: got a KML feed yet? If not, why not? e.g. Innovations in Campus Mapping

PS the backend of my RSS feed autodiscovery app (founded: 2008) is a Yahoo pipe. Just because, I thought I’d take half an hour out to try and build something related on Scraperwiki. The code is here: UK University Autodiscoverable RSS feeds. Please feel free to improve or, fork it, etc. University homepage URLs are identified by scraping a page on the Universities UK website, but I probably should use a feed from the JISC Monitoring Unit (e.g. getting UK University location/contact data).

PPS this could be handy for some folk – the code that runs the talks@cam events site: http://source.caret.cam.ac.uk/svn/projects/talks.cam/. (Thanks Laura:-) – does it do feeds nicely now?! Related: Keeping Up With Events, a quickly hacked app from my Arcadia project that (used to) aggregate Cambridge events feeds.)