Tinkering With Selenium IDE: Downloading Multiple Files from One Page, Keyed by Another

It being the third marking time of year again, I get to enjoy the delights of having to use various institutional systems to access marks and student scripts. One system stores the marks, allocates me my third marking tasks (a table of student IDs and links to their marks, their marks, and a from to submit my marks). Another looks after the scripts.

To access the scripts, I need to go to another system, enter the course code and student ID, one at a time, (hmm, what happens if I try a list of IDs with various separators; could that give me multiple files?) to open a pop-up window from which I can click to collect a zipped file containing the student’s submitted work. The downloaded file is downloaded as a zip file with a filename of the form ECA-2021-06-28_2213.zip ; which is to say, a filename based on datetime.

To update a set of marks, I need to get a verification code from the pop up raised after entering the student ID on the second system into a form on the page associated with a particular student’s marks on the first system. Presumably, the thinking about workflow went something like: third marker looks at marks on first system, copies ID, gets script and code from second system, marks script, enters code from second system in first system, updates mark. For however many scripts you need to mark. One at a time. Rather than: download every script one at a time, do marking howsoever, then have to juggle both systems trying to figure out the confirmation code for a particular student to update the marks from a list you’ve scribbled onto a piece of paper against their ID (is that a 2 or a 7?). Or whatever.

Needless to say, several years ago I hacked a mechanicalsoup Python script to look up my assigned marking on the first system, along with the first and second marks, download all the scripts and confirmation codes from the second system, unzip the student script downloads and bundle everything into a directory tree. I also hacked some marking support tools that would display how the markers compared on each of the five marking criteria they scored scripts against and allow me to record my marks. I held off from automating the upload of marks back to the system and kept that as a manual step becacause I don’t want to get into the habit of hacking code to write to university systems just in case I mess something up… I did try to present my workflow and tools to exams and various others by sharing a Powerpoint review of it, but as I recall never got any reply.

Anyway… mechanicalsoup. A handy package combing mechanize and beautifulsoup, the first part mocked a browser and allowed you to automate it, and the second part provided the scraping utilities. But mechanize doesn’t do Javascript. Which was fine because the marks and scripts systems are old old HTML and easily scraped, pretty vanilla tables and web forms. And the old OU auth was pretty simple to automate your way through too.

But the new OU authenticator uses Javascript goodness(?) as part of its handshake so my timesaver third marking scraping tools are borked because I can’t get through the auth.

So: time to play with Selenium, which is a complete browser automation tool that automates an off-the-shelf browser (Chrome, or Firefox, or Safari etc) rather than mocking one up (as per mechanicalsoup). Intended as a tool for automated testing of websites, you can also use it as a general purpose automation tool, or to provide browser automation for screenscraping. I’ve tinkered with Selenium before, scripting it from Python to automate repetitive tasks (eg Bulk Jupyter Notebook Uploads to nbgallery Using Selenium) but there’s also a browser extension / Selenium IDE that lets you record steps as you work through a series of actions in a live website, as well as scripting in your own additional steps.

So: how hard can it be, I thought, to record a quick script to automate the lookup of student IDs and then step through each one? Surprisingly faffy, as it turns out. The first issue was simply how to iterate through the rows of the table containing each individual student reference to pick up the student ID.

The method I ended up with was to get a cound of rows in the table, then iterate through each row, picking up the student ID as link text (of the form STUDENT_ID STUDENT NAME), duly cleaned by splitting on the first space and grabbing the first element, and then manually creating a string of delimited IDs STUDENT_ID1::STUDENT_ID2::... . (I couldn’t seem to add IDs to an ID array but I was maybe doing something wrong… And trying to find any sensible docs on getting stuff done using the current IDE seems to be a largely pointless task.)

So, I now have a list of IDs, which means I can (automatically) click through the script download system and grab the scripts one at a time. Remember, this involves adding a course code and a student identifer, clicking a button to get a pop up, clicking a button to zip and download the student files, then closing the pop up.

Here’s the first part – entering the course code and student ID:

In the step that opens the new window, we need to flag that a new window has been opened and and generate a reference to it:

In the pop-up, we can then click the collect button, wait a moment for the download to start, then close the pop-up and return to the window where we enter the course code and student ID:

If I now run the script on a browser where I’m already logged in (so the browser already has auth cookies set), I can just sit back and watch it grab the student IDs from my work allocation table on the first system to generate a list of IDs I need scripts for, and then download each one from the second system.

So I have the scripts, but as a set of uselessly named zip files (some of them duplicates); and I don’t have the first and second marks scrpaed from the first system. Or the confirmation codes from the second system. To perform those steps, I probably do need a Python script automating the Selenium actions. the Selenium IDE is fine (ish) for filling in forms with simple scraped state and then clicking buttons that act on those values, but for scraping it’s not really appropriate.

Whilst the Selenium IDE doesnlt export Python code, it does produce an export JSON file that itemises the steps in scripts created in the IDE. This could be used to help boostrap the production of Python code. The Selenium IDE recorder provides a way of recording simple pointy-clicky sequences of action which could be really useful to help get those scripts going. But ideally, I need a thing that can replay the JSON exported scripts from Python then I could have the best of both worlds.

Finally, in terms of design pattern, this recipe doesn’t include any steps that interact directly with logging in: the automated browser uses cookies that have been set previously elsewhere. Rather, the script automates actions over a previously logged in browser. Which means this sort of script is something that could be easily shared within an organisation? So I wonder, are there orgs in which the core systems don’t play well but skunkworks and informal channels share automation scripts that do integrate them, ish?!

(Hmm… would the Python scripted version load a browser with auth cookies set, or does it load into a private browser in which authentication would be required?)

Bah… I really should be marking, not tinkering…

PS It looks like you can export to a particular language script:

…but when I try it I get an error message regarding an Unknown locator:

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: