A Further Look at the Orange Data Playground – Filters and File Merging

In Orange Visual Visualisation Tool, I posted some introductory notes on using the Orange “visual visualisation” tool. Whilst crossing the Solent last week, I had another little play and discovered a few more compelling features about this application.

First up relates to the problem of merging data sets with a common column, for example where you have a common identified appearing in two different data files (e.g. school, council or university ID appearing in different sorts of report). I’ve described several different approaches to this in the past, such as using Google Fusion Tables, for example, but none of them are ideal. But here’s how to do it the Orange way:

Merging data about a common column in Orange

Even though there appears to be only one input element to the Merge Tables widget, there are actually two…

Wiring in the examples connection

Here’s how we choose which columns to merge on:

Merging columns in Orange

Just to prove we’ve merged the data…

Merging data about a common column in Orange

Looking at the data, the original files include x and y columns that were used to represent scaled versions of other columns, headed using the Gephi reserved column names x and y.

It’s easy enough to remove these columns within the Orange environment – simply use the Select Attributes widget.

selecting columns in Orange

The column selection itself is made within the Select Attributes dialogue:

Column selection dialogue in Orange

As well as merging by column, it’s easy enough to concatenate data from several input files that share the same columns.

orange -concatenate widget

Looking at the dialogue for this wdget, we see it’s also possible to use it to merge data tables sharing common columns/attributes (such as a unique identifier), although where tables are joined with uncommon columns, unknown (?) values will be entered in cells for each column where the original data table did not contain that column.

As well as filtering whole columns out of a table using the Select Attributes widget, we can also filter rows based on one cell entries matching specified conditions within particular columns.

Orange - select data rows

It’s also worth mentioning that for large data sets, Orange can generate samples of your data for you:

Orange - data sampling

And finally, once you’ve manipulated your data set, you’ll probably want to save it? That’s wire it in easy too:

Orange - save data

So, pretty impressive, huh? And drop dead easy… or should that be: “click and wire” easy, particularly the data merge on a common column…:-)

Completely OUseless, of course…. If you’ve read this far, I apologise for wasting your time…

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

3 thoughts on “A Further Look at the Orange Data Playground – Filters and File Merging”

  1. Thank you so much for these several articles on Orange I had played a bit with it a while back but the poor documentation discouraged me. But seeing what you are doing rekindles my interest. It is unfortunate that the project does not make itself more accessible via Google groups and Github,, I think they could get much more traction… Anyhow better docs is a huge help.Hopefully the project itself will take on the documentation issue.

Comments are closed.