Fragment: Keeping an Eye on What’s Trackable, Where, and When — Tools for Data Protection Officers as well as the Rest of Us?

Way back when, in the early days of FOI and then “open data”, I naively believed that open data and FOI contact points in organisations would act on as advocates for us outside the organisation getting access to information from inside the organisation. The reality seems to be that as appointees and employees of the organisation, those individuals instead become gatekeepers and often seem to act to find ways of defending the organisation against such requests rather than trying to open the organisation up to them.

When it comes to those appointed to oversee data protection and data privacy issues, I would like to think that whoever is appointed such a role sees it as the role of an advocate for those who work for or come into contact with the organisation, as well as providing an opportunity to aggressively defend the rights of those outside the organisation against the unnecessary and disproportionate collection, processing and sharing of data about them by the organisation. That said, I suspect in many cases the role is more about trying to make sure the company doesn’t get sued under GDPR.

Whilst it would also be nice to think that the data protection person is a geek w/ skillz who can hack their way around an organisation’s systems and websites, poking around to find things that shouldn’t be there and demonstrating how other things can be potentially misused, I suspect they aren’t.

So do we need tools for such officers to keep tabs on their organisation, or perhaps tools to help privacy advocates provide oversight of them?

Poking around traffic generated as I visited the OU VLE a week or two ago, I saw a couple of requests I thought were unnecessary and raised an internal query about them. But it also got me thinking…

The requests appear to be made from tags loaded into the web page using the Google Tag Manager. The Google Tag Manager code appears to be delivered via a gtm.js script with the structure:

{
  "resource": {
    "version": "XXX",
    "macros": [ {} ],
    "tags": [ {} ],
    "predicates": [{}],
    "rules": [ {} ]
  },
  "runtime": [ [], [] ]
}

followed byb a chunk of Javascript code.

The gtm.js file includes rules of the form [["if",1,31],["unless",34,35],["add",51]] that appear to index into the predicates list in the conditional part (logically or’d tests?) and then add a particular tag, which may reference a macro, when the condition is met.

Predicates take the form:

      "function":"_re",
      "arg0":["macro",0],
      "arg1":"^http(s)?:\\\/\\\/(www\\.)?open.ac.uk\\\/?(index.html)?($|\\?)",
      "ignore_case":true
    }

Tags can take a variety of forms, including:

      {
      "function":"__html",
      "once_per_event":true,
      "vtp_html":"\n\u003Cscript type=\"text\/gtmscript\"\u003E!function(b,e,f,g,a,c,d){b.fbq||(a=b.fbq=function(){a.callMethod?a.callMethod.apply(a,arguments):a.queue.push(arguments)},b._fbq||(b._fbq=a),a.push=a,a.loaded=!0,a.version=\"2.0\",a.queue=[],c=e.createElement(f),c.async=!0,c.src=g,d=e.getElementsByTagName(f)[0],d.parentNode.insertBefore(c,d))}(window,document,\"script\",\"https:\/\/connect.facebook.net\/en_US\/fbevents.js\");fbq(\"init\",\"870490019710405\");fbq(\"track\",\"PageView\");\u003C\/script\u003E\n\u003Cnoscript\u003E\n\u003Cimg height=\"1\" width=\"1\" src=\"https:\/\/www.facebook.com\/tr?id=870490019710405\u0026amp;ev=PageView\n\u0026amp;noscript=1\"\u003E\n\u003C\/noscript\u003E\n\n\n",
      "vtp_supportDocumentWrite":false,
      "vtp_enableIframeMode":false,
      "vtp_enableEditJsMacroBehavior":false,
      "tag_id":51
    }

And macros take the form:

{
      "function":"__gas",
      "vtp_cookieDomain":"auto",
      "vtp_doubleClick":false,
      "vtp_setTrackerName":false,
      "vtp_useDebugVersion":false,
      "vtp_useHashAutoLink":false,
      "vtp_decorateFormsAutoLink":false,
      "vtp_enableLinkId":false,
      "vtp_enableEcommerce":false,
      "vtp_trackingId":"UA-4391747-17",
      "vtp_enableRecaptchaOption":false,
      "vtp_enableUaRlsa":false,
      "vtp_enableUseInternalVersion":false
    }

So what I’m wondering is: is there an offline, static analyser for gtm.js scripts that would allow someone to point to a website form a which a gtm.js script be downloaded and then lets them generate human readable reports that:

  • identify in general which trackers are loaded by which rules on which events with what arguments; and
  • identify which trackers are loaded by which rules on which events with what arguments for a specific URL.

This would then allow a university data protection officer, for example, or a student, to provide a URL, such as a URL into the VLE, and get a simple, statically generated report back that shows what trackers are loaded when visiting that environment.

Which is simpler than running Ghostery or opening developer tools in a wide open by default browser like Chrome, rather than the rather more privacy defending Firefox, for example, and searching the network logs for incriminating evidence.

Google Tag Manager has been around for some time, and I’m assuming that organisational web folk have read each line of code in the gtm.js they load into user’s browsers to make sure that it’s not doing anything untoward. (That everyone else uses it is no excuse, unless perhaps it meets some sort of international software quality standard that folk can just embed it without looking at it.)

So I’m wondering:

  • is there a line by line annotated version of the code at the bottom of the gtm.js script anywhere?
  • are there line by line examples out there of a simple gtm.js script and how to read it / analyse it (so eg walking through: this rule says this, which adds that tag, which is then parsed this way?)
  • are there static gtm.js analysers out there that generate the static reports suggested above and that allow folk to analyse arbitrary gtm.js scripts that are loaded into their browser in many of the sites they visit?

So for example, here’s a blog post that describes, line by line, how the Google tag manager container snippet that webmasters embed in their webpages runs so as to load in the gtm.js script: The Container Snippet: GTM Secrets Revealed. What I want is something similar for the gtm.js script…

PS it seems that GTM Spy [h/t/ Simo Ahava] helps browse the tags loaded in from a Google Tag Manager container. For example, here’s a look at code associated with a tag loaded via GTM by a particular org:

and here’s the trigger condition associated with it (I notice that a single naive, pass of my image blurrer, I can still read the contains text…):

This is a good start, but the navigation falls short of being usable by a ‘never-reads-the-manual’ such as myself. For example, looking at one of the triggers on another tab in the GTM Spy view:

I see the tag ID identified but no obvious way of finding out what tag that relates to, or what parameters / data might be returned via that tag?

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...