A Further Look at the Orange Data Playground – Filters and File Merging
In Orange Visual Visualisation Tool, I posted some introductory notes on using the Orange “visual visualisation” tool. Whilst crossing the Solent last week, I had another little play and discovered a few more compelling features about this application.
First up relates to the problem of merging data sets with a common column, for example where you have a common identified appearing in two different data files (e.g. school, council or university ID appearing in different sorts of report). I’ve described several different approaches to this in the past, such as using Google Fusion Tables, for example, but none of them are ideal. But here’s how to do it the Orange way:
Even though there appears to be only one input element to the Merge Tables widget, there are actually two…
Here’s how we choose which columns to merge on:
Just to prove we’ve merged the data…
Looking at the data, the original files include x and y columns that were used to represent scaled versions of other columns, headed using the Gephi reserved column names x and y.
It’s easy enough to remove these columns within the Orange environment – simply use the Select Attributes widget.
The column selection itself is made within the Select Attributes dialogue:
As well as merging by column, it’s easy enough to concatenate data from several input files that share the same columns.
Looking at the dialogue for this wdget, we see it’s also possible to use it to merge data tables sharing common columns/attributes (such as a unique identifier), although where tables are joined with uncommon columns, unknown (?) values will be entered in cells for each column where the original data table did not contain that column.
As well as filtering whole columns out of a table using the Select Attributes widget, we can also filter rows based on one cell entries matching specified conditions within particular columns.
It’s also worth mentioning that for large data sets, Orange can generate samples of your data for you:
And finally, once you’ve manipulated your data set, you’ll probably want to save it? That’s wire it in easy too:
So, pretty impressive, huh? And drop dead easy… or should that be: “click and wire” easy, particularly the data merge on a common column…:-)
Completely OUseless, of course…. If you’ve read this far, I apologise for wasting your time…