Trademark Galleries on Scraperwiki, via OpenCorporates

What trademarks – familiar to us all – are registered with which companies? And how often are trademarked brands in larger outlets actually ‘exclusive offerings’, aka “own range” products in weak disguise? In Looking up Images Trademarked By Companies Using OpenCorporates and Google Refine, I demonstrated a recipe for using OpenCorporates.com as a way in to previewing at least some of the trademarks registered to a specified (UK registered?) company. Here’s a recipe using Scraperwiki to do a similar thing…

But first, do you recognise any of these trademarks…

… as belonging to Tesco?

The recipe goes something like this… For a given company name keyword, look up that company on OpenCorporates and get a list of company identifiers back:

import scraperwiki,simplejson,urllib

target='tesco'

rurl='http://opencorporates.com/reconcile/gb?query='+urllib.quote(target)
#note - the opencorporates api also offers a search:  companies/search
entities=simplejson.load(urllib.urlopen(rurl))
ocids=[]
for entity in entities['result']:
    ocids.append(entity['id'].lstrip('/companies/'))

print ocids

For each of those companies, we’ll need to look up the company details on OpenCorporates. To benefit from an increased API limit, it makes sense to use an OpenCorporates API key to do this. The idea of API keys is typically that they are assigned to specific users, which is to say, you’re supposed to keep them secret. But how do we do this on Scraperwiki, an otherwise open environment? Here’s trick I found on the Scraperwiki blog that allows you to keep things like API keys secret… Make the scraper a protected one, and then hide the keys in scraper description on the scraper’s homepage using the following convention:

__BEGIN_QSENVVARS__
OC_KEY = XXXXXX
__END_QSENVVARS__

where XXXXXX is your API key value (don’t use quotes around the value – use the actual value).

Here’s how we pick up the key:

import os, cgi
try:
    qsenv = dict(cgi.parse_qsl(os.getenv("QUERY_STRING")))
    ockey=qsenv["OC_KEY"]
except:
    ockey=''

To get the trademark data, we need to pull the company data for each company ID of interest, look to see if there are any trademark records associated with it in the OpenCorporates database, and if so, pull those records:

def getOCcompanyData(ocid):
    ocurl='http://api.opencorporates.com/companies/'+ocid+'/data'+'?api_token='+ockey
    ocdata=simplejson.load(urllib.urlopen(ocurl))
    return ocdata

def getOCtmData(ocid,octmid):
    octmurl='http://api.opencorporates.com/data/'+str(octmid)+'?api_token='+ockey
    data=simplejson.load(urllib.urlopen(octmurl))
    octmdata=data['datum']['attributes']
    print 'tm data',octmdata
    categories=[]
    for category in octmdata['goods_and_services_classifications']:
        if 'en' in octmdata['goods_and_services_classifications'][category]:
            categories.append(octmdata['goods_and_services_classifications'][category]['en']+'('+category+')')
    tmdata={}
    tmdata['ocid']=ocid
    tmdata['ocname']=octmdata['holder_name']
    #tmdata['ocname']=ocnames[ocid]
    tmdata['categories']=" :: ".join(categories)
    tmdata['imgtype']=octmdata['mark_image_type']
    tmdata['marktext']=octmdata['mark_text']
    tmdata['repaddr']=octmdata['representative_address_lines']
    tmdata['repname']=octmdata['representative_name_lines']
    tmdata['regnum']=octmdata['international_registration_number']
    #if an image tradmarked, we can work out its URL on the WIPO site...
    if tmdata['imgtype']=='JPG' or tmdata['imgtype']=='GIF':
        tmdata['imgurl']='http://www.wipo.int/romarin/images/' + tmdata['regnum'][0:2] +'/' + tmdata['regnum'][2:4] + '/' + tmdata['regnum'] + '.'+ tmdata['imgtype'].lower()
    else: tmdata['imgurl']='' 
    print tmdata
    scraperwiki.sqlite.save(unique_keys=['regnum'], table_name='trademarks', data=tmdata)

    return octmdata

def grabOCTrademarks(ocid,ocdata):
    for tm in ocdata['data']:
        if tm['datum']['data_type']=='WipoTrademark':
            octmid=tm['datum']['id']
            octmdata=getOCtmData(ocid,octmid)

for ocid in ocids:
    print 'company data',getOCcompanyData(ocid)
    ocdata=getOCcompanyData(ocid)
    octmdata=grabOCTrademarks(ocid,ocdata)

You can find the actual scraper here: Scraperwiki scraper: OpenCorporates Trademark demo

The code for the view (shown above) can be found here: Scraperwiki View: OpenCorporates Trademark demo (code) and here’s the actual view

Note that I think the OpenCorporates databases only has a very fragmentary coverage over trademark records for each company. Even keeping up with daily updates relating to trademarks and trademarked images on WIPO looks like it could be quite a major undertaking, which is maybe why companies such as Thomson Reuters recently saw an opportunity to Create Most Comprehensive Collection of Searchable Trademark Data in the World.

That’s not to say I don’t see value in this micro-attempts at making sense of the world around us such as this one, though. Trademark brands are pervasive, but it’s often not obvious who actually owns them, and may even mask a lack of competition with an appearance of competition (for example, when a company owns two different trademark brands that a consumer thinks of as being competitive offerings by different providers, when in fact they are both owned by the same company; or when you mistake a company offshoot for a third party franchise, as I realised when I saw that BP owned the Wild Bean Cafe trademark…)

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

5 thoughts on “Trademark Galleries on Scraperwiki, via OpenCorporates”

  1. Tony
    Great work as ever. We pull in the Trademark data every night from WIPO — it’s a huge XML file, which is a pain to parse, but you’re efforts show that it’s worthwhile.
    Chris

  2. @chris :-) I didn’t realise you were going after all the WIPO trademark data? I thought about trying to do something to preview daily trademark updates, but I couldn’t think how to sensibly display them?

    One thing I’ve started to come round to is how Linked Data approaches may start to come into their own when it comes to searching. For example, it would be handy to be able to write a query that could look for trademarks registered by a given company name or companies who share an address or director with the named company.

    What would be a really neat hack would be something like Google image search that could identify a trademarked image, tell you its registration details and then map out other trademarks in the same area/sector registered by other companies in the same corporate group…

Comments are closed.

%d bloggers like this: