Posts Tagged ‘digischol’
Yesterday morning, I wanted to grab hold of a summary of the number of views my uploaded presentations on Slideshare have had, A quick scan of the Slideshare API suggests that a bit of a handshake is required, at least in generating an MD5’d hash of a key with a Unix timesatamp. I have a pipe that does something similar somewhere (err, or at least part of it… here maybe).
I didn’t have the 10 minutes or so such a pipework hack should take (i.e. half an hour, just in case, plus up to half an hour to blog any solution I came up with;-), so I had a quick look at the YQL community tables to see if anyone had developed a wrapper for calling at least part of the Slideshare API, and it seems some has:
So here’s a pipe that generates a list of a user’s 20 most recent Slideshare uploads, along with how many times they have been downloaded:
And here’s how the output looks:
Note to self: make some time to see what other YQL community tables are available…
A tweet just passed me by from @andypowell at today’s Linked Data: The Future of Knowledge Organization on the Web event:
“need to introduce data literacy into education in order to create data literate citizens” closing remarks by nigel shadbolt at #isko
With the OU’s infoskills short course Beyond Google: working with information online in it’s last week of registration for its last presentation, it may be that there’ll be a slot open in the short course programme in a year or two for an OU course on data literacy (and visualisation…?!;-), but in the meantime, to justify some of things I’m getting up to, I suspect I’m going to have to try to persuade folk that there’s some merit in figuring out what sorts of tools make sense in the world of unlimited open data and data scholarship…
Now I have to admit that I’m not sure at all sure what data scholarship is, or might be (same with data literacy… sigh…) but here are a few possible starters for ten…
I first came across something close to the phrase whilst at the Repository Fringe, searching for papers relating to referencing data, in a preprint from Peter Murray Rust – Open Data in Science: “Recent initiatives such as the JISC/NSF report on cyberscholarship have emphasized the critical importance of data-driven scholarship.” Digging around the phrase turned up one or two references to data citability as being a key requirement for data(-driven) scholarship, a point also touched on by Kevin Ashley in his closing keynote at the RepoFringe. In particular Kevin referenced Peter Buneman‘s work on that very topic, which in a roundabout way led me to finding a paper by Bruce Barkstorm on Digital libraries and data scholarship, which again looks at some of the issues involved in referencing data. (I’ll do a post or two on data referencing – something I need to improve in my own practice – at some point…)
So for example, the abstract to Barkstrom’s paper begins: “In addition to preserving and retrieving digital information, digital libraries need to allow data scholars to create post-publication references to objects within files and across collections of files” before going on to discuss referencing matters. So implicitly, data scholarship must be something to do with poring through other peoples’ old data…
I’m still not sure I know what a data scholar might actually do though, or why, although it seemingly requires ability to reference data, so I took a sideways step to review what a digital scholar might be… Martin Weller has posted about this previously (e.g. Thoughts on digital scholarship), relating the idea to Boyer’s notions of scholarship (discovery, integration, application, teaching).
A short unit on Connexions (What Is Digital Scholarship?) by the American Council of Learned Societies Commission on Cyberinfrastructure for the Humanities & Social Sciences suggests:
In recent practice, “digital scholarship” has meant several related things:
- Building a digital collection of information for further study and analysis*
– Creating appropriate tools for collection-building
– Creating appropriate tools for the analysis and study of collections
– Using digital collections and analytical tools to generate new intellectual products
– Creating authoring tools for these new intellectual products, either in traditional forms or in digital form
* like this 500 page bibliography on digital scholarship [via @jfj24]. My response: “is the idea that i read those 500 pages of citations and from titles alone form a coherent view about what’s involved?!” Heh heh ;-)
The piece goes on:
It may seem odd to some that creating collections and the tools to use them should be counted as scholarship, but humanities and social science research has always required collections of appropriate information, and throughout history, scholars have often been the ones to assemble those collections, as part of their scholarship. Moreover, scholars have been building tools since the first index, the first concordance, the first scholarly edition. So, while it is reasonable to regard (d) as the core meaning and ultimate objective of “digital scholarship,” it is also important to recognize that in the early digital era, leadership may well consist of collection-building or tool-building. In addition, tool-building is dependent on the existence of collections, and both collections and tools get better and more general as there is more use of digital information. If we hope to see new intellectual products, we should give high priority to building tools and collections. Finally, it is worth noting that although (a), (b), (c), and (e) require a great deal of cooperation, it is still imaginable that (d) can be the work of a single individual.
Remember, I am in part looking for a definition of data scholarship to justify spending time on OUseful things, so maybe here we have something like it…? Because I think I can argue that OUseful.info identifies/discovers useful tools, integrates them within an information processing context that includes other tools and services, applies them to particular “real world” examples, and then teaches on (sort of!) how to do the same (so that’s Boyer’s boxes ticked). In addition, some of the integrations I come up with could be classed as the development of new tools in their own right, and as far as collections go: I’ve always been keen on trying to make “discovered context” tangible, as with the discovered search engines I’ve blogged about recently.
PS quickly skimming the above, it seems to me that scholarship maybe has a couple of facets: firstly, the development and identification of tools and techniques that allow “scholars” to do what they do; secondly, the use of those tools and techniques to make sense and meaning of things produced by others beyond the sense and meaning that they themselves have extracted. Recalling the idea that the most interesting thing that will be done with your data will be done by someone else, maybe that’s what scholars do?
Lots of deleted stuff I might have regretted posting…
(I also apologise in advance for what some might take to be the self-aggrandising nature of this post…)
Anyway, that’s all as maybe… One of the ideas I started trying to develop in preparing the promotion case was the notion of “influence”, and how online, network based activities might result in payoff for someone else, through being influenced, that could in part trickle back through some sort of recognised acknowledgement, or feed forward into a payoff that makes the academic or host institution more productive.
So here are a handful of examples from the last week or so that provide anecdotal evidence about the influence and reach of posts appearing on OUseful.info:
I flashed up on screen a post from Tony Hirst’s OUseful blog where he confessed to ‘hassling’ Simon Rogers over the formats of some of the information in the Guardian Datastore.
Tony’s contributions are fantastically useful, and the team have now changed some of their workflows to try and include more universal identifiers. On datasets with country lists, for example, they now aim to provide the two letter ISO country code in order to get around confusion when comparing datasets that might feature Burma or Myanmar for example.
[C]hanged some of their workflows… right… so that might make it easier for others, such as academics stooping so low as to use news media published data rather than “original” sources in their own work. Or it might mean that folk who are not academics putting the data to work because it’s now easier for them to do so, and getting real value out of it.
(Academic bashing? Me? Surely not… Though of course, I have come to realise over the last year that I am absolutely not considered an academic by the academy…)
Here’s the second example, referring to some “work” that resulted from an open exchange over a weekend earlier this year which Cameron Neylon reviewed in A little bit of federated Open Notebook Science. The context is graphing user and compound interactions by extracting the appropriate bipartite graph from a set of open notebooks:
We are very fortunate that Don Pellegrino, an IST student at Drexel, has selected the analysis of networks within Open Notebooks as part of his Ph.D. work. He has started to report his progress on our wiki and is eager to receive any feedback as the work progresses (his FriendFeed account is donpellegrino).
Don’s first report is available here. He is using the Open Source software Gephi for visualization and has provided all of the data and code on the associated wiki page. (also see Tony Hirst’s description of mapping ONS work which provided some very useful insights) Don has provided a detailed report of his findings but I think the most important can be seen in the global plot below.
[S]ome very useful insights – right, a couple of approximately and quickly worked through examples that sketched out some possible ways of looking at this area, as well as crude proof of concept demos; it maybe also identified some dead-ends that might otherwise have been pursued?
Finally, this from Brian Kelly:
Niall Sclater made his point succinctly:
@mweller @psychemedia delicious. i rest my case.
The case Niall was making was, I suspect, that one shouldn’t be promoting use of Cloud services within institutions. This is an argument (although that might be putting it a bit too strongly) which Niall has been having over the past few years with Tony Hirst and Martin Weller, his colleagues at the Open University. As I described in a post on “When Two Tribes Go To War” back in 2007:
Niall Sclater, Director of the OU VLE Programme at the Open University recently pointed out that the Slideshare service was down, using this as an “attempt to inject some reality into the VLEs v Small Pieces debate“. His colleague at the Open University, Tony Hirst responded with a post entitled “An error has occurred whilst accessing this site” in which Tony, with “beautifully sweet irony“, alerted Niall to the fact that the OU’s Intranet was also down.
Back then the specifics related to the reliability of the Slideshare service, with Tony pointing out the the Slideshare service was actually more reliable that the Open University’s Intranet. But that was just a minor detail. The leaked news that Yahoo was, it appeared, intending to close a social bookmarking services which is highly regarded by many of its users, was clearly of much more significance. So is Niall correct to rest his case on this news? Or, as Niall wrote his tweet before we found that the news of Delicious’s death was greatly exaggerated, might we feel that the issue is now simply whether an alternative social bookmarking service should be used?
What this example shows, and maybe the one before it too, is that the very act of working in open and in public means that the process of the work/interaction as well as the “work” itself can become the focus of (authentic) stories in other people’s work. Brian has been telling the above story repeatedly over the last few years, which has the side-effect of raising the OU’s profile as an institution that is *really* engaged with these issues.
None of the above anecdotes has resulted in an academic citation for me, so none of it counts in academic terms. None of the above resulted in the OU being paid for the time I spent engaged in the related activities, so it none if it helped the OU bottom line directly (we’re really, really a business now, right?). None of the above ended up in any OU course materials (to my knowledge). It was all, from my perspective looking round my current institutional role, pointless…
PS it’s worth noting that, through trackbacks and email requests, I see these ephemeral “been influenced by” signals on my web radar as a matter of course. But my internal profile is largely below the radar, and these “influence signals” are likely to be even more invisible. This maybe suggests that my reach is only to folk who look outwards (from any institution), using the web, or the people who see me give a presentation (which I do once a month or so)… Hmm…
Last week, I was fortunate enough to receive an invitation to attend the Texts and Literacy in the Digital Age: Assessing the future of scholarly communication at the Dutch National Library in Den Haag (a trip that ended up turning into a weekend break in Amsterdam when my flight was cancelled…)
The presentation can be found here and embedded below, if your feed reader supports it:
One thing I have tried to do is annotate each slide with a short piece of discursive text relating to the slide. I need to find a way of linearising slide shows prepared this way to see if I can find a way of generating blog posts from them, which is a task for next year…
The presentation draws heavily on Martin Belam’s news:rewired presentation from 2009 (The tyranny of chronology), as I try to tease out some of the structural issues that face the presentation of news media in an online networked age, and constrast (or complement) them with issues faced by scholoarly publishing.
One of the things I hope to mull over more next year, and maybe communicate in a more principled way rather than via occasional blog posts and tweets, are the ways in which news media and academia can work together to put the news into some sort of deeper context, and maybe even into a learning (resource) context…
I really shouldn’t have got distracted by this today, but I did; via Owen Stephens: seen altmetric – tracking social media & other mentions of academic papers (by @stew)?
Monthly Altmetric data downloads of tweets containing mentions of published articles are available for download from Buzzdata, so I grabbed the September dataset, pulled out the names of folk sending the tweets, and how many paper mentioning tweets they had sent from the Unix command line:
cut -d ',' -f 3 twitterDataset0911_v1.csv | sort |uniq -c | sort -k 1 -r > tmp.txt
Read this list into a script, pulled out the folk who had sent 10 or more paper mentioning updates, grabbed their Twitter friends lists and plotted a graph using Gephi to see how they connected (nodes are coloured according to a loose grouping and sized according to eigenvector centrality):
My handle for this view is that is shows who’s influential in the social media (Twitter) domain of discourse relating to the scientific topic areas covered by the Altmetric tweet collection I downloaded. To be included in the graph, you need have posted 10 or more tweets referring to one or more scientific papers in the collection period.
We can get a different sort of view over trusted accounts in the scientific domain by graphing the network of all the friends of (that is, people followed by) the people who sent 10 or more paper referencing tweets in September, as collected by altmetric, edges going from altmetric tweeps to all their friends. This is a big graph, so if we limit it to show folk followed by 100 or more of the folk who sent paper mentioning tweets and display those accounts, this is what we get:
My reading of this one is that it show folk who are widely trusted by folk who post regular updates about scientific papers in particular subject areas.
Hmmm… now, I wonder: what else might I be able to do with the Altmetric data???
PS Okay – after some blank “looks”, here’s the method for the first graph:
1) get the September list of tweets from Buzzdata that contain a link to a scientific paper (as determined by Altmetric filters);
2) extract the Twitter screen names of the people who sent those tweets.
3) count how many different tweets were sent by each screen name.
4) extract the list of screen-names that sent 10 or more of the tweets that Altmetric collected. This list is a list of people who sent 10 or more tweets containing links to academic papers. Let’s call it ‘the September10+ list’.
5) for each person on the September10+ list, grab the list of people they follow on Twitter.
6) plot the graph of social connections between people on the Septemeber10+ list.
Okay? Got that?
Here’s how the second graphic was generated.
a) take the September10+ list and for each person on it, get the list of all their friends on Twitter. (This is the same as step 5 above).
b) Build a graph as follows: for each person on the September10+ list, add a link from them to each person they follow on Twitter. This is a big graph. (The graph in 6 above only shows links between people on the September10+ list.)
c) I was a little disingenuous in the description in the body of this post… I now filter the graph to only show nodes with degree of 100 or more. For folk who are on the September10+ list, this means that the sum of the people on the September10+ list, and the total number of people they follow is equal to or greater than 100. For folk not on the September10+ list, this means that they are being followed by people with a degree of 100 or more who are on the September10+ list (which is to say they are being followed by at least 100 or so people on the September10+ list; I guess there could be folk followed by more than 100 people on the September10+ list who don’t appear in the graph if, for example, they were followed by folk in the original graph who predmoninantly had a degree of less than 100?).
d) to plot word cloud graphic above, I visualise the filtered graph and then hide the nodes whose in-degree is 0 (that is, they aren’t followed by anyone else in the graph).
Got that? Simples… ;-)
Brian has a post out on Beyond Blogging as an Open Practice, What About Associated Open Usage Data?, and proposes that “when adopting open practices, one should be willing to provide open accesses to usage data associated with the practices” (his emphasis).
What usage stats are relevant though? If you’re on a hosted WordPress blog, it’s easy enough to pull out in a machine readable way the stats that WordPress collects about your blog and makes available to you (albeit at the cost of revealing a blog specific API key in the URL. Which means that if this key provides access to anything other than stats, particularly if it provides write access to any part of your blog, it’s probably not something you’d really want to share in public… [Getting your WordPress.com Stats API Key])
That said, you can still hand craft your own calls to the WordPress stats API, and extract your own usage data as data, using the WordPress Stats API.
So for example, a URL of the form:
will pull in a summary of November’s views data; or:
will pull in a list of referrers.
For what it’s worth, I’ve started cobbling together a spreadsheet that can pull in live data, or custom ranged reports, from WordPress: WordPress Stats into Google Spreadsheets (make your own personal copy of the spreadsheet if you want to give it a try). This may or may not become a work in progress… at the moment, it doesn’t even support the full range of URL parameters/report configurations (for the time being at least, that is leaf “as an exercise for the reader”;-)
The approach I took is very simplistic, simply based around crafting URLs that grab specified sets of CSV formatted data, and pop them into a spreadsheet using the =importData() formula (I’m sure Martin could come up with something far more elegant;-); that said, it does provide an example of how to get started with a bit of programmatic URL hacking… and if you want to get started with handcrafting your own URLs, it provides a few examples there too….:-)
The pattern I used was to define a parameter spreadsheet, and then CONCATENATE parameter values to create the URLs; for example:
=importdata(CONCATENATE("http://stats.wordpress.com/csv.php?", "api_key=", Config!B2, "&blog_uri=", Config!B3, "&end=", TEXT(Config!B6,"YYYY-MM-DD"), "&table=referrers_grouped"))
One trick to note is that I defined the end parameter setting in the configuration sheet as a date type, displayed in a particular format. When we grab this data value out of the configuration sheet we’re actually grabbing a date typed record, so we need to use the TEXT() formula to put it into the format that the WordPress API requires (arguments of the form 2011-11-30).
If you want to use the spreadsheet to publish your own data, I guess one way would would be to keep the privacy settings private, but publish the sheets you are happy for people to see. Just make sure you don’t reveal your API key;-) [If you know of a good link/resource describing best practice around publishing public sheets from spreadsheets that also contain, and drawn on, private data, such as API keys, please post a link in the comments below;-)]
[A note on the stats: the WordPress stats made available via the API seem to relate to page views/visits to the website. Looking at my own stats, views from RSS feeds seem to be reported separately, and (I think) this data is not available via the WordPress stats API? If, as I do, you run your blog RSS feed through a service like Feedburner, to get a fuller picture of how widely the content on a blog is consumed, you’d need to report both the WordPress stats and the Feedburner stats, for example. Which leads the the next question, I guess: how can we (indeed, can we at all?) pull feed stats out of Feedburner?]
At this point, I need to come back to the question related above: what usage stats are relevant, particularly in the case of a JISC project blog? To my mind, a JISC project blog can support a variety of functions:
- it serves as a diary for the project team allowing them to record micro-milestones and solutions to problems; if developers are allowed to post to the blog, this might include posts at the level of granularity of a Stack Overflow Q and A, compared to the 500 word end-of-project post that tries to summarise how a complete system works;
– it can provide a feed that others can subscribe to to keep up to date with the project without having to hassle the project team for updates;
– it can provide context for the work by linking out to related resources, an approach that also might alert other projects who watch for trackbacks and pingbacks to the the project;
– it provides an opportunity to go fishing in a couple of ways: firstly, by acting as a resource others can link to (with the triple payoff that it contextualises the project further, it may suggest related work the project team are unaware by means of trackbacks/pingbacks into the project blog, and it may turn up useful commentary around the project); secondly, by providing a place where other interested parties might engage in discussion commentary or feedback around elements of the project, via blog comments.
Even if a blog only ever gets three views per post, they may be really valuable views. For me what’s important is how the blog can be used to document interesting things that might have been turned up in the course of doing the project that wouldn’t ordinarily get documented. Problems, gotchas, clever solutions, the sudden discovery or really useful related resources. The blog also provides an ongoing link-basis for the project, something that can bring it to life in a networked context (a context that may have a far longer life, and scope, than just the life or scope of the actual project).
For many projects that don’t go past a pilot, it may well be that the real value of the project is the blogged documentation of things turned up during the process, rather than any of the formal outputs… Maybe..?!;-)
PS in passing, Google Webmaster tools now lets you track search stats around articles Google associates you with as an author: Clicks and impressions for authors. It’s been some time since I looked at Google Webmaster tools, but as Ouseful.info is registered there, I thought I’d check my broken links…and realised just how many pages get logged by Google as containing broken links when a single post erroneously contains a relative link… (i.e. when the <a href=’ doesn’t start with http://)
PPS Related to the above is a nice example of why I think being able to read and write URL is an important skill, something Jon Udell also picks up on in Forgotten knowledge. In the above case, I needed to unpick the WordPress Stats APi documentation a little to work out how to put the URLs together (something that a knowledge of how to read and write URL helped me with). In Jon Udell’s case was an example of how a conference organiser was able to send a customised URL to the conference hotel that embedded the relevant booking dates.
But I wonder, in an age where folk use Google+search term (e.g. typing Facebook into Google) rather than URLs (eg typing facebook.com into a browser location bar), a behaviour that can surely only be compounded by the fusion of location and search bars in browsers such as Google Chrome, is “URL literacy” becoming even more of a niche skill, rather than becoming more widespread? Is there some corollary here to the world of phones and addressbooks? I don’t need to remember phone numbers any more (I don’t even necessarily recognise them) because my contacts lists masks the number with the name of the person it corresponds to. How many kids are going to lose out on a basic education in map reading because there’s no longer a need to learn route planning or map-based navigation – GPS, SatNav and online journey planners now do that for us… And does this distancing from base skills and low level technologies extend further? Into the kitchen, maybe? Who needs ingredients when you have ready meals (and yes, frozen croissants and gourmet meals from the farm shop do count as ready meals;-), for example? Who needs to actually use a cookery book (or really engage with a lecture) when you can watch a TV chef, (or TED Talks)..?