Does Transfer Learning with Pretrained Models Lead to a Transferable Attack?

Reading a post just now on Logo detection using Apache MXNet, a handy tutorial on how to train an image classifier to detect brand logos using Apache MXNet, a deeplearning package for Python, I noted a reference to the MXNet Model Zoo.

The Model Zoo is an ongoing project to collect complete models [from the literature], with python scripts, pre-trained weights as well as instructions on how to build and fine tune these models. The logo detection tutorial shows how training your own network with a small number of training images is a bit rubbish, but you can make the most of transfer learning to take a prebuilt model that has been well trained and “top it up” with your own training samples. The guess the main idea is: the lower layers of the original model will be well trained to recognise primitive image features, and can be reused, and the final model tweaked to reweight these lower level features in the upper layers so the overall model works with your particular dataset.

So given the ability to generate adversarial examples that trick a model into seeing something that’s not there,  how susceptible will models built using transfer learning on top of pretrained models be to well honed attacks developed on that pretrained model? To what extent will the attacks work out of the can (and/or to what extent) or how easily will they be transferred?

To read:

 

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: