OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Archive for July 2011

Risk Assessment: Corporate Acquisitions Can Kill APIs

with one comment

So it seems that my to-do list just got shorter as Twitter acquire BackType and as a result “will discontinue the BackType product and API services”.

Bah…:-(

On my roadmap (err, such as it is!;-), one thing I was hoping to do was start exploring in more detail the struture of communities around a shared link, with a view to exploring in more detail some of the actual dynamics of link sharing across Twitter networks. My early forays in to this have tended to use BackType, as for example in Visualising Ad Hoc Tweeted Link Communities, via BackType.

The simple recipe I’d started out with was based around the following steps:

- given the URL, look up who’s tweeted it via the BackType API;
- for each tweeter of the link, grab the list of people they follow (i.e. their friends);
- plot the “inner” network showing which of the people who tweeted the link the follow each other.

This gave an easy way in to identifying a set of folk who had expressed an interest in a link by virtue of sharing it, this set then acting as the starting point for a community analysis.

Another approach I started to explore (but never blogged?!) was looking at networks of folk who had shared one of the links recently shortened by a particular bit.ly user. So for example, this graph (captured some time ago) used the BackType API to find who had tweeted one of more of 15 or so links that @charlesarthur had shortened using bit.ly, and then plotted friend connections between them:

follower connections between folk tweeting one or more of 15 links also recently shortened on bitly by charlesarthur

Unfortunately, now that the BackType API has gone (when I try to call it I get a “Limit exceeded” error message), the key ingredient from those two original recipes is no longer available…:-(

Written by Tony Hirst

July 6, 2011 at 9:55 am

Posted in Anything you want

Tagged with ,

Slides from OU Rise Library Analytics Workshop: Rambling about Visualisation

with 2 comments

For what it’s worth, slides from my presentation yesterday… As ever, they’re largely pointless without commentary…

… and even with the commentary, it was all a bit more garbled than usual (I forgot to breathe, had no real idea in my own mind what I wanted to say, etc etc…)

On reflection, here’s what I took from thinking back about what I should have tried to say:

- my assumption is that folk who are interested in asking data related questions should feel as if they can actually work with the data itself (direct data manipulation); I appreciate this is already way off the mark for some people who want someone else to work the data and then just read reports about it – but then that means you can’t ask or discover your own questions about the data, just read answers (maybe) to questions that someone else has asked, presented in a way they decided;

- you need to feel confident in working with data files – or at least, you need to be prepared to have a go at working with data files! (Bear in mind that many of the blog posts I write are write ups – of a sort – of how to do something I didn’t know how to do a couple of hours before… The web usually has answers to most of the questions that I come up against – and if I can’t find the answers, I can often request them via things like Twitter or Stack Overflow…) This can range from using command line tools, to using applications that let you take data in using one format and getting it out as another);

- different tools do different things; if you can get a dataset into a tool in the right way, it may be able to do magical things very very easily indeed…

- three tools that can do a lot without you having to know a lot (though you may have to follow a tutorial or two to pick up the method/recipe….or at least recognise a picture you like and a dataset whose shape you can replicate using your own data, and then the ability to see which bits you need to cut and paste into the command line…):

-=- Gephi: great for plotting networks and graphs. It can also be appropriated to draw line charts (if you can work out how to ‘join the dots’ in the data file by turning the line into a set of points connected by edges) or scatter plots (just load in nodes – no edges connecting them – and lay it out using Gephi’s geolayout tool which also lets you plot “rectilinear” plots based on x and y axis values; (I haven’t worked out a reliable way of working with CSV in Gephi – yet…); it’s amazing what you can describe as a graph when you put your mind to it…

-=- gnuplot: command line tool for plotting scatter plots and line graphs (eg from time series) using data stored in simple text file (e.g. TSV or CSV)

-=- R (and ggplot if you’re feeling adventurous and want :pretty”, nicely designed graphs out); another command line tool (I find R-Studio helps) that again loads in data from a CSV file; R can generate statistical graphs very easily from the command line (it does the stats calculations for you given the raw data).

- Visual analytics/graphical data analysis is a process – you tease out questions and answers through directly manipulating the data and engaging with it in a visual way;

- when you see a visualisation you like, look at it closely: what do you see? Spending five mins or so looking at a Gestalt psychology/visual perception tutorial will give you all sorts of tricks and tips for how to construct visualisations so that structure your eye can detect will jump out at you;

- I think I may have confused folk talking about “dimensions”: what I meant what, how many columns could you represent in a given visulisation at the same time, if each data point corresponds to a single row in a data set. So for example, if you have an x-y plot (2 dimensions), with different symbols (1 dimension) available for plotting the points, as well as different colours (1 dimension) and different possible size (1 dimension) for each symbol, along with a label (1 dimension) for each point, and maybe control over the size (1 dimension), colour (1 dimension) and even font (1 dimension) applied to the label, you might find you can actually plot quite a few columns/dimensions for each data point on your chart… Whether or not you can actually decipher it is another matter of course! My Gephi charts generally have 2 explicit dimensions (node size and colour), as well as making use of two spatial dimensions (x, y) to lay out points that are in some sense “close” to each other in network space. It’s worth remembering though, that if you’re using a tool to engage in a conversation with a dataset as you try to get it to tell its story to you, it may not matter that the visualisation looks a mess to anyone else (a bit like an involved conversation may not make sense if someone else suddenly tries to join it). (Presentation graphics, on the other hand, are usually designed to communicate something that the data is trying to say to another person in a very explicit way.)

- working with data is a tactile thing… you have to be prepared to get your hands dirty…

Written by Tony Hirst

July 5, 2011 at 1:47 pm

Posted in Data, Presentation

Marussia Virgin Racing F1 Factory Visit

with 3 comments

Yesterday, I had the good fortune to visit the F1 Marussia Virgin Racing factory at Dinnington, near Sheffield, as a result of “winning” a luck dip competition run via GoMotorSport (part of a series of National Motorsport week promotions being run by the F1 teams based in the UK).

Marussia Virgin F1 Factory
[Thanks to @markhendy for the pic...]

Thanks to Finance Director Mark Hendy and engineer Shakey for the insight into the team’s operations:-)

Over the next few days and weeks, I’ll try to pick up on a few of the things I learned from the tour on the F1DataJunkie blog, tying them in to the corresponding technical regulations and other bits and pieces, but for now, here are some of the noticings I came away with…

- the engines aren’t that big, weighing 90kg or so and looking small than the engine in my own car…

- wheels are slotted onto the axles using a 3 pin mount on the front and a six(?) pin mount on the rear. (The engines are held on using a 6(?) point fixing.)

- the drivers aren’t that heavy either, weight wise (not that we met either of the drivers: neither Timo Glock nor Jerome D’Ambrosio are frequent visitors to the Dinnington factory, where the team’s cars are prepared fro before, and overhauled after, each race…): 70 kg or so. With cars prepared to meet racing weight regulations to a tolerance of 0.5kg or so, a large mixed grill and a couple of pints can make a big difference… (Hmm, I guess it would be easy enough to calculate the “big dinner weight effect” penalty on laptime?!)

I’m not sure if this was a “right-handed vs left-handed spanner” remark, but a comment was also made that the adhesive sponsor sticker can have a noticeable effect on the car’s aerodynamics as the corners become unstuck and start to flap. (Which made me wonder, of that is the case, is the shape of stickers taken into account? Is a leading edge on a label with a point/right angled corner rather than a smooth curve likely to come unstuck more easily, for example?!) Cars also need repainting every few races (stripping back to the carbon, and repainting afresh) because of pitting and chipping and other minor damage than can affect smooth airflow.

- side impact tubes are an integral part of the safety related design of the car:

- to track the usage of tyres during a race weekend, an FIA official scans a barcode on each tyre as it is used on the car:

The data junkie in me in part wonders whether this data could be made available in a timely fashion via the Pirelli website (or a Pirelli gadget on each team’s website) – or would that me giving away too much race intelligence to the other teams? That way, we could get an insight into the tyre usage over the course weekend…

- IT plays an increasingly important part of the the pit garage setup; local area networks (cabled and wifi?) are set up by each team for the weekend, the data engineers sitting behind the screen and viewing area in the garage (rather than having a fixed set up in one of the 5(?) trucks that attends each race.).

- the cars are rigged up with 60 or sensors; there is only redundancy on throttle and clutch sensors. Data analysis is in part provided through engineers provided by parts suppliers (McLaren Electronics, who supply the car’s ECU (and telemetry box(?)) provide a dedicated person(?) to support the team; data analysis is, in part, carried out using the Atlas (9?) Advanced Telemetry Linked Acquisition System from McLaren Electronic Systems. Data collected during a stint is transmitted under encryption back to the the pits, as well as being logged on the car itself. A full data dump is available to the team and the FIA scrutineers via an umbilical/wired connection when the car is pitted.

UST Global, one of the teams partners, also provide 3(?) data analysts to support the team during a race (presumably using UST Global’s “Race Management System”?).

- for design and testing, weekly reporting is required that conforms to a trade-off between the number of hours per week that each team can spend on wind tunnel testing (60 hours per week) and and CFD (“can’t find downforce”;-) simulation (40 teraflops per week). My first impression there was that efficient code could effectively mean more simulation testing?! (CFD via CSC? CSC expands relationship with Marussia Virgin Racing, doubling computing power for the team’s 2011 formula 1 season, or are things set to change with the replacement of Nick Wirth by Pat Symonds…?)

- the resource restriction agreement also limits the number of people who can work on the chassis. For a race weekend, teams are limited to 50 (47?) people. We were given a quick run down of at least (8?) engineer roles assigned to each car, but I forget them…

So – that’s a quick summary of some of the things I can remember off the top of my head…

…but here are a couple of other things to note that may be of interest…

Marussia Virgin are making the most of their Virgin partnership over the Silverstone race weekend with a camping party/Virgin Experience at Stowe School (Silverstone Weekend) and a hook-up with Joe Saward’s “An Audience With Joe“… (If you don’t listen to @sidepodcast’s An Aside With Joe podcast series, you should…;-)

The team has also got en education thing going with race ticket sweeteners for folk signing up to the course: Motorsport Management Online Course.

I can’t help thinking there may be a market for a “hardcore fans” course on F1 that could run over a race season and run as an informal, open online course… I still don’t really know how a car works, for example ;-)

Anyway – that’s by the by: thanks again to the GoMotorsport and the Marussia Virgin Racing team (esp. Mark Hendy and Shakey) for a great day out :-)

PS I think the @marussiavirgin team are trying to build up their social media presence too… to see who they’re listening to, here’s how their friends connect:

How friends of @marussiavirgin connect

;-)

Written by Tony Hirst

July 2, 2011 at 1:55 pm

Follow

Get every new post delivered to your Inbox.

Join 126 other followers