Over the last few weeks, I’ve started scraping through some of the visual things that statisticians do with data. I have to admit that I’m not that interested in learning arcane tests for significance in peculiarly distributed populations, but I am keen to see what statistical graphs are out there that you can throw data at to see whether or not the data hides an interesting story (trends, clusters and outliers are three story signatures that make folk-sense to me in data terms).
And as R seems to have traction at the moment, as well as: a) being cross platform and free; b) having an active plugin “developer” community, it seems to make sense to find a way in through that… (There seems to be a shed load of recent and soon-to-be-published books around R on Amazon at the moment, and if Google uses it… (also: Google’s R style guide)).
To ease my way into R, I’ve started using R-Studio, an in-development IDE. But the other day, I was also tipped off about Red-R, a visual programming environment for R that seems to be built around the same tooling as the Orange data analysis tool I wrote about last year.
It’s still pretty ropey at the moment (on a Mac at least), but works enough to be going on with…
The metaphor is based on pipeline processing of data, chaining together functional blocks with wires in the order you want the functions to be executed. Getting data in is currently from a file (it would be nice to see hooks into online datasources supported too), with a range of options for getting the data into the environment in a structured way:
As with Orange, there are lots of opportunities to comment and add notes to record what you’re doing/seeing within a block.
Going through the widget menus quickly, there are blocks for – reshaping data:
(I know from web traffic into this blog that this is something that a lot of people appreciate tool support for; I’ll try to do a summary post of how R can help in this regard in the next somewhen…)
We can also generate subsets of data:
(I found it easy enough to pull out a set of columns, but I didn’t immediately spot how to pull out rows by cell value in a given column, which is pretty fundamental?)
There are also blocks for plotting various charts and graphs:
And of course there’s the whole stats thing too…
I’ll try to have a play over the coming weeks and pop up some simple recipes of how to use this app…
For anyone who really struggles with using a command line, Red-R may provide a handy way in; from a very quick play, though, it’s not obvious how to do certain trivial things (filter in or filter out a set of rows for example).
If you already know R, it may or may not be obvious how to use Red-R immediately and cut down on the typing; but if you’re thinking of using Red-R as a visual environment for plotting graphs and charts with no prior experience of R, there may be “issues”. For example, if Red-R works best for people who have an understanding of R and how it works, Red-R may not actually be a very good tool for teaching the model that underpins R, and as such it may be hard to learn to use through direct manipulation. This might be particularly true if Red-R is a very literal interpretation of R, and simply puts a widget layer on top of R-commands to make them dialogue driven. Which is to say, it may be that an additional abstraction layer is required that combines several basic R commands into higher level and more natural dialogues that makes Red-R as a visual environment easier to use without instruction?
Because “without instruction” is currently the only way to engage with Red-R: the documententation is currently very sparse indeed:-(
In the meantime, I think I’m going to stick with using R-Studio
PS ..but then, maybe a few quick tutorial posts here might help to start address the lack of quick start howto’s???;-)