A Further Look at the Orange Data Playground – Filters and File Merging

In Orange Visual Visualisation Tool, I posted some introductory notes on using the Orange “visual visualisation” tool. Whilst crossing the Solent last week, I had another little play and discovered a few more compelling features about this application.

First up relates to the problem of merging data sets with a common column, for example where you have a common identified appearing in two different data files (e.g. school, council or university ID appearing in different sorts of report). I’ve described several different approaches to this in the past, such as using Google Fusion Tables, for example, but none of them are ideal. But here’s how to do it the Orange way:

Even though there appears to be only one input element to the Merge Tables widget, there are actually two…

Here’s how we choose which columns to merge on:

Just to prove we’ve merged the data…

Looking at the data, the original files include x and y columns that were used to represent scaled versions of other columns, headed using the Gephi reserved column names x and y.

It’s easy enough to remove these columns within the Orange environment – simply use the Select Attributes widget.

The column selection itself is made within the Select Attributes dialogue:

As well as merging by column, it’s easy enough to concatenate data from several input files that share the same columns.

Looking at the dialogue for this wdget, we see it’s also possible to use it to merge data tables sharing common columns/attributes (such as a unique identifier), although where tables are joined with uncommon columns, unknown (?) values will be entered in cells for each column where the original data table did not contain that column.

As well as filtering whole columns out of a table using the Select Attributes widget, we can also filter rows based on one cell entries matching specified conditions within particular columns.

It’s also worth mentioning that for large data sets, Orange can generate samples of your data for you:

And finally, once you’ve manipulated your data set, you’ll probably want to save it? That’s wire it in easy too:

So, pretty impressive, huh? And drop dead easy… or should that be: “click and wire” easy, particularly the data merge on a common column…:-)

Completely OUseless, of course…. If you’ve read this far, I apologise for wasting your time…

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering... View all posts by Tony Hirst

3 thoughts on “A Further Look at the Orange Data Playground – Filters and File Merging”

Thank you so much for these several articles on Orange I had played a bit with it a while back but the poor documentation discouraged me. But seeing what you are doing rekindles my interest. It is unfortunate that the project does not make itself more accessible via Google groups and Github,, I think they could get much more traction… Anyhow better docs is a huge help.Hopefully the project itself will take on the documentation issue.

Pingback: Merging Two Different Datasets Containing a Common Column With R and R-Studio « OUseful.Info, the blog…

Pingback: Merging Datasets with Common Columns in Google Refine « OUseful.Info, the blog…

Comments are closed.