Decompressing zipped Javascript

Because it took me soooo long to figure this out…

Zipped Javascript pulled from a URL using python requests; requests.get assigned to r.

#via https://stackoverflow.com/a/28642346/454773

import gzip
import base64

jsonTxt = gzip.decompress(base64.b64decode(r.text))

#using zlib.compress gives an error: lib.error:
#Error -3 while decompressing data: incorrect header check

Does Transfer Learning with Pretrained Models Lead to a Transferable Attack?

Reading a post just now on Logo detection using Apache MXNet, a handy tutorial on how to train an image classifier to detect brand logos using Apache MXNet, a deeplearning package for Python, I noted a reference to the MXNet Model Zoo.

The Model Zoo is an ongoing project to collect complete models [from the literature], with python scripts, pre-trained weights as well as instructions on how to build and fine tune these models. The logo detection tutorial shows how training your own network with a small number of training images is a bit rubbish, but you can make the most of transfer learning to take a prebuilt model that has been well trained and “top it up” with your own training samples. The guess the main idea is: the lower layers of the original model will be well trained to recognise primitive image features, and can be reused, and the final model tweaked to reweight these lower level features in the upper layers so the overall model works with your particular dataset.

So given the ability to generate adversarial examples that trick a model into seeing something that’s not there,  how susceptible will models built using transfer learning on top of pretrained models be to well honed attacks developed on that pretrained model? To what extent will the attacks work out of the can (and/or to what extent) or how easily will they be transferred?

To read:

 

Cognitive Science, 2.0? Google Psychlab

Whilst I was rooting around looking for things to do the year or two after graduating, I came across the emerging, inter-disciplinary field of cognitive science, which used ideas from cognitive psychology, philosophy, AI, linguistics and neuroscience to try to understand “mind and consciousness stuff”. I read Dennett and Searle, and even toyed with going to do a Masters at Indiana, where David Chalmers had been doing all manner of interesting things as a PhD student.

I was reminded of this yesterday whilst reading a post on the Google DeepMind blog – Open-sourcing Psychlab – which opened in a style that began to wind me up immediately:

Consider the simple task of going shopping for your groceries. If you fail to pick-up an item that is on your list, what does it tell us about the functioning of your brain? …

What appears to be a single task actually depends on multiple cognitive abilities. We face a similar problem in AI research …

To address this problem in humans, psychologists have spent the last 150 years designing rigorously controlled experiments aimed at isolating one specific cognitive faculty at a time. For example, they might analyse the supermarket scenario using two separate tests – a “visual search” test that requires the subject to locate a specific shape in a pattern could be used to probe attention, while they might ask a person to recall items from a studied list to test their memory. …

“To address this problem humans”… “rigorously controlled”, pah! So here we go: are Google folk gonna disrupt cognitive psychology by turning away from the science and just throwing a bunch of numbers they’ve managed to collect from wherever, howsoever, into a couple of mathematical functions that tries to clump them together without any idea about what any clusters or grouping mean, or what they’re really clustering around…?

We believe it is possible to use similar experimental methods to better understand the behaviours of artificial agents. That is why we developed Psychlab [ code ], a platform built on top of DeepMind Lab, which allows us to directly apply methods from fields like cognitive psychology to study behaviours of artificial agents in a controlled environment. …

Psychlab recreates the set-up typically used in human psychology experiments inside the virtual DeepMind Lab environment. This usually consists of a participant sitting in front of a computer monitor using a mouse to respond to the onscreen task. Similarly, our environment allows a virtual subject to perform tasks on a virtual computer monitor, using the direction of its gaze to respond. This allows humans and artificial agents to both take the same tests, minimising experimental differences. It also makes it easier to connect with the existing literature in cognitive psychology and draw insights from it.

So, to speed up the way Google figures out how to manipulate folks’ human attention through a screen, they’re gonna start building cognitive agents that use screens as an interface (at first), develop the models so they resemble human users (I would say: “white, male, 20s-30s, on the spectrum”, but it would be more insidious perhaps to pick demographics relating to “minority” groups that power brokers (and marketers) would more readily like to “influence” or “persuade”. But that would be a category mistake, because I don’t think cognitive psychology works like that.), then start to game the hell out of them to see how you can best manipulate their behaviour.

Along with the open-source release of Psychlab we have built a series of classic experimental tasks to run on the virtual computer monitor, and it has a flexible and easy-to-learn API, enabling others to build their own tasks.

Isn’t that nice of Google. Tools to help cog psych undergrads replicate classic cognitive psychology experiments with their computer models.

Each of these tasks have been validated to show that our human results mirror standard results in the cognitive psychology literature.

Good. So Google has an environment that allows you to replicate experiments from the literature.

Just remember that Google’s business is predicated on developing ad tech, and ad revenue in turn is predicated on find ways of persuading people to either persist in, or modify, their behaviour.

And once you’ve built the model, then you can start to manipulate the model.

When we did the same test on a state-of-the-art artificial agent, we found that, while it could perform the task, it did not show the human pattern of reaction time results. .. this data has suggested a difference between parallel and serial attention. Agents appear only to have parallel mechanisms. Identifying this difference between humans and our current artificial agents shows a path toward improving future agent designs.

“Agents appear only to have parallel mechanisms”. Erm? The models that Google built appear only to have parallel mechanisms?

This also makes me think even more we need to rebrand AI. Just like “toxic” neural network research required a rebranding as “deep learning” when a new algorithmic trick and bigger computers meant bigger networks with more impressive results than previously, I think we should move to AI as meaning “alt-intelligence” or “alternative intelligence” in the sense of “alternative facts”.

“[A] path toward improving future agent designs.” That doesn’t make sense? Do they mean models that more closely represent human behaviour in terms of the captured metrics?

What would be interesting would be if the DeepMind folk found they hit a brick wall with deep-learning models and that they couldn’t find a way to replicate to human behaviour. Because that might help encourage the development of “alternative intelligence” critiques.

Psychlab was designed as a tool for bridging between cognitive psychology, neuroscience, and AI. By open-sourcing it, we hope the wider research community will make use of it in their own research and help us shape it going forward.

Hmmm…. Google isn’t interested in understanding how people work from a point of view of pure inquiry. It wants to know how you work so it can control, or at least influence, your behaviour. (See also Charlie Stross, Dude, you broke the future!.)

“The Follower Factory” – Way Back When vs Now

A news story heavily doing the rounds at the moment in the part of the Twitterverse I see is a post from the New York Times called The Follower Factory which describes how certain high profile figures feel the need to inflate their social media follower counts by purchasing followers.

Such behaviour isn’t new, but the story is good one – a useful one – to re-tell every so often. And the execution is a nice bit of scrollytelling.

A few years ago, I posted a recipe for generating such charts (Estimated Follower Accession Charts for Twitter) and applied it to the followers of UK MPs at the time, cross-referencing spikes in follower acquisition against news items in the days before, and on the day of, and hike in numbers or the rate of acquisition of numbers (What Happened Then? Using Approximated Twitter Follower Accession to Identify Political Events).

My thinking at the time was that bursts in follower acquisition, or changes in rate of follower acquisition, might correlate with news events around the person being followed. (It turned out that other researchers had followed a similar idea previously: We know who you followed last summer: inferring social link creation times in twitter.)

Whilst it is trivial for Twitter to generate such reports – they own the data – it is harder for independent researchers. When I charted the follower accession curves for UK MPs, I had access to a whitelisted Twitter API count that meant I could download large amounts of data quite quickly. The (generic) rate limiting constraints on my current Twitter account mean that grabbing the data to generate such charts in a reasonable amount of time nowadays would be all but impossible.

There are several ways round this: one is to purchase the data using a service such as Export Tweet; one is to abuse the system and create my own army of (“fake”) Twitter accounts in order to make use of their API limits to mine the Twitter API using a distributed set of accounts under my control; a third is to “borrow” the rate limits of other, legitimate users.

For example, many sites now offer “Twitter analysis” services if you sign in with your Twitter account and grant the service provider permission to use your account to access various API calls through your account. I imagine that one of the ways such services cover the costs of their free offerings is to make use of API calls generated from user accounts to harvest data to build a big database that more valuable services can be provided off the back of.

In this case, whilst the service is free, the aim is not specifically to collect data about the user so they can be sold to advertisers as part of a specific audience or market segment, but instead to make use of the user’s access to API services so that the service provider can co-opt the user’s account to harvest data from Twitter in a distributed way. That is, the service provider gains the necessary benefits from each user to cover the costs of servicing the needs of that user by gaining access to Twitter data more generally, using the the user’s account as a proxy. The mass of data can then be mined and analysed to create market segments, or exploited otherwise.

This approach to exploiting users by means of exploiting their access to a particular resource is also being demonstrated elsewhere. For example, a Wired post from October last year on “cryptojacking” – Your Browser Could Be Mining Cryptocurrency For A Stranger describes how users’ computers can be exploited via their browsers in the form of Javascript code embedded in web pages or adverts that “steals” (i.e. appropriates, or co-opts) CPU cycles (and the electricity used to power them) to do work on behalf of the exploiter – such as mining bitcoin. A more recent story elsewhere – The Coming Battle for Your Web Browser [via @euroinfosec] – describes how the same sort of exploitation can be used to co-opt your browser to take part in distributed denial of service attacks (the provision of which is another thing that can be bought and sold, and that can hence be used to generate revenue for the provider of that service). Defensive countermeasures are available.

PS I notice @simonw has been grabbing his Twitter follower data into a datasette. Thinks: maybe I should tidy up my ESP code for generating follower networks and use SQLite, rather than MongoDB, for that? I think I had an ESP to do list to work on too..?

Rally Marshal Taster Event

If you ever attend a motorsport, from speedway to track racing, rally to hill climb, you’re likely to notice an orange army of volunteers helping keep the event on track.

We’ve been attending the occasional motorsport event – touring cars at Snetterton, the WRC rally in Wales – a few times a year for getting on for a decade. The pace is a bit less hectic than other family treks to the muddy fields of music festivals, but just as entertaining.

37965167842_d26a7b7561_o

There’s often stall at the larger events run by the British Motorsports Marshals Club (BMMC) or the British Rally Marshals Club (BRMC)), one of which I popped in to at the WRC Wales GB RallyFest stage at Cholmondeley Castle last year. As a direct result of that, I ended up at Brands Hatch last week on a rally marshal taster day (the first one?) at the MGJ Engineering Brands Hatch Winter Stages, organised by the Chelmsford Motor Club.

Pretty much the only pre-requisite for attending the taster was to do the rally marshals online training, which is required for rally marshal accreditation. This includes several short modules on safety, good practice, and a first-on-scene briefing. Assessment is through multiple choice activities.

I also had to print off the ticket that was sent to me to get me free entry – and car access – into the circuit.

And so, a week or so ago, I parked in the marshal’s parking area and turned up for sign on a little after 6.30am at a soon-to-be-wet Brands Hatch, where I grabbed a breakfast and was met by Chief Marshal for the day, Luis Gutierrez Diaz.

After a bit of orientation, and tour of the service park, I started the day at the initial time control, where cars sign on to the stage at a required time in one minute windows. The cars were starting at 30s intervals, which meant two cars arriving each minute.

One thing I learned immediately was how the rally is managed using paperwork…

Photo credit: @wallacerally

Arriving at the initial time control at the due time, co-drivers handed their time card to one marshal who recorded the time give by an official clock, and then released the car at the appropriate time.

Another marshal recorded the accession number of the car through the control point, along with the car number and the time. This record was used to give a count of the cars passing the initial time control into the stage.

I had a go at filling in this form in under a makeshift shelter of the open back of an estate car, using the back shelf for writing support.

(I’m regretting not having kept a photo diary!)

Next stop was the start. GoMotorsport‘s John Conboy gave me a lift up to the start area, where the multi-generational group of family (mum, dad, son, daughter) and friends that were the Zoo Crew Motorsport Team were handling matters.


Photo credit: MCS Rally Results

Passing through another marshaled staging point prior to the start, the Zoo Crew (with team hats!) provided each car with an actual start time (one marshal taking the time and signing the co-driver’s card, along with the number of the preceding car onto the stage, another recording this on a master sheet), then passed then on to the start (one marshal to stop the car, one marshal to record the car number using the timer and set the electronically recorded start time – which also controlled the countdown clock).

Between stages, the layout of the course was changed, which meant once all the cars were through the stage, it was time to help move some cones…

After walking back to grandstand (a short walk – Brands Hatch is quite compact) I met up with John again, grabbed a coffee, and went off to the finish, where a group of five marshals were staffing the finish line: one to read out the time recorded by the timer for each car, one to record the time, car number, and preceding car number, and three to grab the cards from the finishing cars. (Another two marshals were opposite the actual finish line, in line of site with the timing beam, to manually record the times as cars passed to stage finish as a back up.)

By this time, it was raining… but we managed to stay dry. The stage finish marshals were another regular gang, their van stocked with supplies and a gazebo covering the control area to keep us all relatively dry.

At the end of my stint there, the rally reversed the way the cars went round the track, so I grabbed a lift back to the start, which was about the become the finish, as the start/end marshal teams changed ends.

Meeting up with John Conboy again, he took me to mid-stage, and Team Berris, another crew of family and friends, this time spanning three generations, and a expansive gazebo kitted out with everything the gang could need for the day!

As with the other teams, marshals were appointed to a position for the day, unlike my see it all taster experience. Compared to the time control points, where the marshals played a role in managing the on-stage administration of the rally, this was more about being there just in case, although at some of the other stations I suspect the marshals had a busier time of it in the form of changing the track layout between stages.

The circuit setting of this rally meant the marshals in the stage had a role similar to circuit racing track marshals. (The rally was actually a round in the MSN Circuit Rally Championship, which looks to be a cunning way to make competitive use of motor racing circuits in the track racing offseason.) From visiting the Wyedean stages as a spectator, as well as from conversation throughout the day, I imagine the experience on a forest stages is rather different, covering off footpaths and access roads, for example, and managing the safety of spectators as much as competitors. (That said, spectator management by marshals was also evident at Brands Hatch, in the form of a marshal preventing public access to the VIP infield area by way of the underpass.) The ability to see pretty much every aspect of the on-the-day activities surrounding the rally, which might be a bit harder to achieve at an event not based around circuit facilities, was really useful.

One useful tip I was given about the on-stage marshaling, which also backed up some of the online training, was to “wait 10 seconds” in the event of an accident or incident. This gives you time to take stock of the situation, and make a decision about whether a car needs assistance at all. The underlying principle is that if a driver and co-driver can safely get out of their vehicle, they should be left to it.

Another tip: bring a whistle. When attending an incident, marshals look out for everyone’s safety, including their own. At the sound of a whistle, something’s coming…

So that was it, my rally marshal taster day, 7 till 7, and a great day it was too. The camaraderie was evident – multigenerational family groups – and welcoming to a newcomer such as myself. Many of the marshals seem to current, or previous, competitors and have a wealth of stories to share and tales to tell. If there are teens in your life, under eighteens are allowed, with consent… The range of different activities and opportunities involved in marshaling an event like this was also an eye-opener for me.

So what didn’t I see?

Some of the on-stage marshals I didn’t chat to were the radio marshals, who often double up as flag marshals. These are the only people with a guaranteed link back to the rally control and the clerk of the course, in case of incident. The radio channels are all public though, so taking a scanner along would mean you could listen in on the traffic.

I also didn’t get to see the handling of the paperwork either as it was issued to competitors or once cards were received from them, or collected from time controls. (One of Luis’ jobs was to drive round the stages collecting paperwork from the time controls every couple of stages.) I got the feeling that the electronic timing data records were a backup, and the rally was actually administered through paperwork. (Another note to self: pencil can work better than pen in the damp…)

As mentioned before, I also forgot to grab some photos…

So what next? I put in an application for my MSA marshal registration, although that wasn’t a necessary requirement for the taster day and you don’t actually need to register to volunteer, though that means you have to be paired up with someone.

I also started looking around the BRMC website at the progression path for marshals:

Although you don’t have to commit to anything, the more experience you have the more opportunities there are…

One thing I’ve started thinking is that I should perhaps volunteer at my local car club; I suspect that any event registered with the MSA and run under MSA Blue Book regulations (where can I get a print copy of that?) is likely to require marshal support somewhere…

And the number of opportunities for rally marshals is likely to increase. For many years, “the holding of races or trials of speed between motor vehicles on public ways in England” has been impossible, but the Deregulation Act 2015 changed that with a series of amendments to the Road Traffic Act 1988 .

As a direct result of those amendments, the Chelmsford Motor Club are organising a closed roads event – the Corbeau Seats Rally: Tendring & Clacton – on public roads in April this year. (I did try to look for their application to the Essex County Council Highways Authority, but couldn’t find it anywhere?)

To sum up, the taster event was a great idea. If my first experience had been stood in the middle of a forest for a whole day somewhere, I wouldn’t have learned as much about the wide variety of activities that need to be covered, although from experience of standing in forests as a spectator, I’m sure it still would have been fun:-)

As someone who has only spectated at, and never competed in, any sort of motorsport event (other than a couple of goes at karting) it was fascinating to get a flavour of the event as a whole. Since then, I’ve had a poke around for other informal reviews of “procedure” which nicely complement the day: Co-driving Guide; and here: Rally Time Control Etiquette]. Which reminds me, I’d like like to grab a set of the various forms and cards used to see properly how things cross-check, data geek that I am:-)

Interested?

I think GoMotorsport are supporting another marshal taster day at the Snetterton Stage Rally on February 18th, 2018 (marshal sign-up page) and I’d recommend it to anyone who fancies getting started.

Many thanks to Luis Gutierrez Diaz, John Conboy, the Zoo Crew, Team Berris, and all the other marshals who put up with my naive questions. I’ll hopefully get to see them all again, somewhere, sometime…

Reproducible Notebooks Help You Track Down Errors…

A couple of days ago on the Spreadsheet Journalism blog, I came across a post on UK Immigration Raids:.

The post described a spreadsheet process for reshaping a a couple of wide format sheets in a spreadsheet into a single long format dataset to make the data easier to work with.

One of my New Year resolutions was to try to look out for opportunities to rework and compare spreadsheet vs. notebook data treatments, so here’s a Python pandas reworking of the example linked to above in a Jupyter notebook: UK Immigration Raids.ipynb.

You can run the notebook “live” using Binder/Binderhub:

Look in the notebooks folder for the UK Immigration Raids.ipynb notebook.

A few notes on the process:

  • there was no link to the original source data in the original post, although there was a link to a local copy of it;
  • the original post had a few useful cribs that I could use to cross check my working with, such as the number of incidents in Bristol;
  • the mention of dirty data made me put in a step to find mismatched data;
  • my original notebook working contained an error – which I left in the notebook to show how it might propagate through and how we might then try to figure out what the error was having detected it.

As an example, it could probably do with another iteration to demonstrate a more robust process with additional checks that transformations have been correctly applied to data along the way.

Anyway, that’s another of my New Year resolutions half implemented: *share your working, share your mistakes(.

Fragment – Carillion-ish

A quick sketch of some companies that are linked by common directors based on a list of directors seeded from Carillion PLC.

The data was obtained from the Companies House API using the Python chwrapper package and some old code of my own that’ll I’ll share once I get a chance to strip the extraneous cruft out of the notebook it’s in.

The essence of the approach / recipe is an old one that I used to use with OpenCorporates data, as described here: Mapping corporate networks-with Opencorporates.

Note the sketch doesn’t make claims about anything much. The edges just show that companies are linked by the same directors.

A better approach may be to generate a network based on control / ownership registration data but I didn’t have any code to hand to do that (it’s on my to do list for my next company data playtime!).

One way of starting to use this sort of structure is to match companies that appear in the network with payments data to see the actual extent of public body contracting with Carillion group companies. For other articles on Carillion contracts, see eg Here’s the data that reveals the scale of Carillion’s big-money government deals.