Looking up Images Trademarked By Companies Using OpenCorporates and Google Refine

Listening to Chris Taggart talking about OpenCorporates at netzwerk recherche conf – data, research, stories, I figured I really should start to have a play…

Looking through the example data available from an opencorporates company ID via the API, I spotted that registered trademark data was available. So here’s a quick roundabout way of previewing trademarked images using OpenCorporates and Google Refine.

First step is to grab the data – the opencorporates API reference docs give an example URL for grabbing a company’s (i.e. a legal entity’s) data: http://api.opencorporates.com/companies/gb/00102498/data

Google Refine supports the import of JSON from a URL:

(Hmm, it seems as if we could load in data from several URLs in one go… maybe data from different BP companies?)

Having grabbed the JSON, we can say which blocks we want to import as row items:

We can preview the rows to check we’re bringing in what we expect…

We’ll take this data by clicking on Create Project, and then start to work on it. Because the plan is to grab trademark images, we need to grab data back from OpenCorporates relating to each trademark. We can generate the API call URLs from the datum – id column:

The OpenCorporates data item API calls are of the form http://api.opencorporates.com/data/2601371, which we can generate as follows:

Here’s what we get back:

If we look through the data, there are several fields that may be interesting: the “representative_name_lines (the person/group that registered the trademark), the representative_address_lines, the mark_image_type and most importantly of all, the international_registration_number. Note that some of the trademarks are not images – we’ll end up ignoring those (for the purposes of this post, at least!)

We can pull out these data items into separate columns by creating columns directly from the trademark data column:

The elements are pulled in using expressions of the following form:

Here are the expressions I used (each expression is used to create a new column from the trademark data column that was imported from automatically constructed URLs):

  • value.parseJson().datum.attributes.mark_image_type – the first part of the expression parses the data as JSON, then we navigate using dot notation to the part of the Javascript object we want…
  • value.parseJson().datum.attributes.mark_text
  • value.parseJson().datum.attributes.representative_address_lines
  • value.parseJson().datum.attributes.representative_name_lines
  • value.parseJson().datum.attributes.international_registration_number

Finding how to get images from international registration numbers was a bit of a faff. In the end, I looked up several records on the WIPO website that displayed trademarked images, then looked at the pattern of their URLs. The ones I checked seemed to have the form:
http://www.wipo.int/romarin/images/XX/YY/XXYYNN.typ
where typ is gif or jpg and XXYYNN is the international registration number. (This may or may not be a robust convention, but it worked for the examples I tried…)

The following GREL expression generates the appropriate URL from the trademark column:

if( or(value.parseJson().datum.attributes.mark_image_type==’JPG’, value.parseJson().datum.attributes.mark_image_type==’GIF’), ‘http://www.wipo.int/romarin/images/’ + splitByLengths(value.parseJson().datum.attributes.international_registration_number, 2)[0] + ‘/’ + splitByLengths(value.parseJson().datum.attributes.international_registration_number, 2, 2)[1] + ‘/’ + value.parseJson().datum.attributes.international_registration_number + ‘.’ + toLowercase (value.parseJson().datum.attributes.mark_image_type), ”)

The first part checks that we have a GIF or JPG image type identified, and if it does, then we construct the URL path, and finally cast the filetype to lower case, else we return an empty string.

Now we can filter the data to only show rows that contain a trademark image URL:

Finally, we can create a template to export a simple HTML file that will let us preview the image:

Here’s a crude template I tried:

The file is exported as a .txt file, but it’s easy enough to change the suffix to .html so that we can view the fie in a browser, or I can cut and paste the html into this page…

null null
null null
“[\”MURGITROYD & COMPANY\”]” “[\”17 Lansdowne Road\”,\”Croydon, Surrey CRO 2BX\”]”
“[\”A.C. CHILLINGWORTH\”,\”GROUP TRADE MARKS\”]” “[\”Britannic House,\”,\”1 Finsbury Circus\”,\”LONDON EC2M 7BA\”]”
“[\”A.C. CHILLINGWORTH\”,\”GROUP TRADE MARKS\”]” “[\”Britannic House,\”,\”1 Finsbury Circus\”,\”LONDON EC2M 7BA\”]”
“[\”A.C. CHILLINGWORTH\”,\”GROUP TRADE MARKS\”]” “[\”Britannic House,\”,\”1 Finsbury Circus\”,\”LONDON EC2M 7BA\”]”
“[\”A.C. CHILLINGWORTH\”,\”GROUP TRADE MARKS\”]” “[\”Britannic House,\”,\”1 Finsbury Circus\”,\”LONDON EC2M 7BA\”]”
“[\”BP GROUP TRADE MARKS\”]” “[\”20 Canada Square,\”,\”Canary Wharf\”,\”London E14 5NJ\”]”
“[\”Murgitroyd & Company\”]” “[\”Scotland House,\”,\”165-169 Scotland Street\”,\”Glasgow G5 8PL\”]”
“[\”BP GROUP TRADE MARKS\”]” “[\”20 Canada Square,\”,\”Canary Wharf\”,\”London E14 5NJ\”]”
“[\”BP Group Trade Marks\”]” “[\”20 Canada Square, Canary Wharf\”,\”London E14 5NJ\”]”
“[\”ROBERT WILLIAM BOAD\”,\”BP p.l.c. – GROUP TRADE MARKS\”]” “[\”Britannic House,\”,\”1 Finsbury Circus\”,\”LONDON, EC2M 7BA\”]”
“[\”ROBERT WILLIAM BOAD\”,\”BP p.l.c. – GROUP TRADE MARKS\”]” “[\”Britannic House,\”,\”1 Finsbury Circus\”,\”LONDON, EC2M 7BA\”]”
“[\”ROBERT WILLIAM BOAD\”,\”BP p.l.c. – GROUP TRADE MARKS\”]” “[\”Britannic House,\”,\”1 Finsbury Circus\”,\”LONDON, EC2M 7BA\”]”
“[\”ROBERT WILLIAM BOAD\”,\”BP p.l.c. – GROUP TRADE MARKS\”]” “[\”Britannic House,\”,\”1 Finsbury Circus\”,\”LONDON, EC2M 7BA\”]”
“[\”MURGITROYD & COMPANY\”]” “[\”17 Lansdowne Road\”,\”Croydon, Surrey CRO 2BX\”]”
“[\”MURGITROYD & COMPANY\”]” “[\”17 Lansdowne Road\”,\”Croydon, Surrey CRO 2BX\”]”
“[\”MURGITROYD & COMPANY\”]” “[\”17 Lansdowne Road\”,\”Croydon, Surrey CRO 2BX\”]”
“[\”MURGITROYD & COMPANY\”]” “[\”17 Lansdowne Road\”,\”Croydon, Surrey CRO 2BX\”]”
“[\”A.C. CHILLINGWORTH\”,\”GROUP TRADE MARKS\”]” “[\”Britannic House,\”,\”1 Finsbury Circus\”,\”LONDON EC2M 7BA\”]”
“[\”BP Group Trade Marks\”]” “[\”20 Canada Square, Canary Wharf\”,\”London E14 5NJ\”]”
“[\”ROBERT WILLIAM BOAD\”,\”GROUP TRADE MARKS\”]” “[\”Britannic House,\”,\”1 Finsbury Circus\”,\”LONDON, EC2M 7BA\”]”
“[\”BP GROUP TRADE MARKS\”]” “[\”20 Canada Square,\”,\”Canary Wharf\”,\”London E14 5NJ\”]”

Okay – so maybe I need to tidy up the registration related columns, but as a recipe, it sort of works. (Note that it took way longer to create this blog post than it did to come up with the recipe…)

A couple of things that came to mind: having used Google Refine to sketch out this hack, we could now move code it up, maybe in something like Scraperwiki. For example, I only found trademarks registered to one legal entity associated with BP, rather than checking for trademarks held by the myriad number of legal entities associated with BP. I also wonder whether it would be possible to “compile” what Google Refine is doing (import from URL, select row items, run operations against columns, export templated data) as code so that it could be run elsewhere (so for example, could all through steps be exported as a single Javascript or Python script, maybe calling on a GREL/Google Refine library that provides some sort of abstraction layer of virtual machine for the script to make use of?)

PS What’s next…? The trademark data also identifies one or more areas in which the trademark applies; I need to find some way of pulling out each of the “en” attribute values from the items listed in the value.parseJson().datum.attributes.goods_and_services_classifications.

8 comments

  1. tehdai

    Wow. So much potential for this – I didn’t even know trademark data was playable-with and you’re already hacking it seemingly with ease. I’m interested to see what you come up with regarding web-UX related trademarks… terrifying to think how much of the language of our daily experience is protected and owned…

    I know I don’t always comment but I do read every post, please keep up the amazing work!

    • Tony Hirst

      Thanks for the comment ;-) Next step is to do a network map of companies and add in trademarked images? There is also trademark topic area codes/descriptions in the data, so I guess there be some way of seeing how different companies carve up topic areas with different trademarked images?
      I thought the BP thing was interesting in that they trademark Wild Bean Cafe. I guess I’d always thought of that as an agreed concession with another company rather than being a BP brand…but I can see it makes sense….

  2. Pingback: Looking up Images Trademarked By Companies Using OpenCorporates and Google Refine - Just another My blog Sites site - currenttechnologyarticles
    • Tony Hirst

      @tim ah, wonderful, thanks… Having got the steps identified, I was thinking of trying to build a company specific version in Scraperwiki as a proof of concept. But the Easter break has interrupted things (family trek, wifi free zone…:-(

  3. Pingback: Trademark Galleries on Scraperwiki, via OpenCorporates « OUseful.Info, the blog…
  4. Pingback: #CAST12 DataViz Sandbox – Resources « OUseful.Info, the blog…
  5. Pingback: Guest Post: Data Sketching With the OpenCorporates API | OpenCorporates news