Rolling Your Own IT – Automating Multiple File Downloads

Yesterday, I caught up with a video briefing on Transforming IT from the OU’s Director of IT, recorded earlier thus year (OU internal link, which, being on Sharepoint, needs Microsoft authentication, rather than OU single sign on?).

The video, in part, describe the 20 year history of some of the OU’s teaching related software services, which tended to be introduced piecemeal and which are necessarily as integrated as they could be…

In the post Decision Support for Third Marking Significant Difference Double Marked Assessments, I mentioned part of the OU process for managing third marking.

Guidance provided for collecting scripts for third marking is something like this:

The original markers’ scores and feedback will be visible in OSCAR.

Electronically submitted scripts can be accessed in the eTMA system via this link: …

Please note the scripts can only be accessed via the EAB/admin tab in the eTMA system ensuring you add the relevant module code and student PI.

[My emphasis.]

Hmmm… OSCAR is accessed via a browser, and supports “app internal” links that display the overall work allocation, a table listing the students, along with their PIs, and links to various data views including the first and second marks table referred to in the post mentioned above.

The front end to the eTMA system is a web form that requests a course code and student PI, which then launches another web page listing the student’s submitted files, a confirmation code that needs to be entered in OSCAR to let you add third marks, and a web form that requires you to select a file download type from a drop down list with a single option and a button to download the zipped student files.

So that’s two web things…

To download multiple student files requires a process something like this:

So why not just have a something on the OSCAR work allocation page that that lets you select – or select all – the students and download all the files, or get all all the confirmation codes?

Thinks… I could do that, sort of, over  coffee…. (I’ve tried to obfuscate details while leaving the general bits of code that could be reused elsewhere in place…)

First up, we need to login and get authenticated:

#Login
!pip3 install MechanicalSoup

import mechanicalsoup
import pandas as pd

USERNAME=''
PASSWORD=''
LOGIN_URL=''
FORM_ID='#' 

def getSession():
 browser = mechanicalsoup.StatefulBrowser()
 browser.open(LOGIN_URL)
 browser.select_form(FORM_ID) #in form: #loginForm
 browser[_USERNAME] = USERNAME
 browser[_PASSWORD] = PASSWORD
 resp = browser.submit_selected()
 return browser

s=getSession()

Now we need a list of PIs. We could scrape these from OSCAR, but that’s a couple of steps and easier just to copy and paste the table from the web page for now:

#Get student PIs - copy and paste table from OSCAR for now

txt='''
CODE\tPI NAME\tMARKING_TYPE\tSTATUS
...
CODE\tPI NAME\tMARKING_TYPE\tSTATUS
'''

#Put that data into a pandas dataframe then pull out the PIs
from io import StringIO

df=pd.read_csv(StringIO(txt),sep='\t',header=None)
pids=[i[0] for i in df[1].str.split()]

We now have a list of student PIs, which we can iterate through to download the relevant files:

#Download the zip file for each student
import zipfile, io, random

def downloader(pid, outdir='etmafiles'):
  print('Downloading assessment for {}'.format(pid))
  !mkdir -p {outdir}
  payload = {FORM_ELEMENT1:FILETYPE, FORM_ELEMENT2: FILE_DETAILS(pid)}
  url=ETMA_DOWNLOARD_URL_PATTERN(pid)
  #Download the file...
  r=s.post(url,data=payload)

  #...and treat it as a zipfile
  z = zipfile.ZipFile(io.BytesIO(r.content))
  #Save a bit more time for the user by unzipping it too...
  z.extractall(outdir)

#Here's the iterator...
for pid in pids:
  try:
    downloader(pid)
  except:
    print('Failed for {}'.format(pid))

We can also grab the “student page” from the eTMA system and scrape it for the confirmation code. (On to do list, try to post the confirmation code back to OSCAR to authorise the upload of third marks, as well as auto-posting a list of marks and comments back.)

#Scraper for confirmation codes
def getConfirmationCode(pid):
  print('Getting confirmation code for {}'.format(pid))
  url=ETMA_STUDENT_PAGE(pid, ASSESSMENT_DETAILS)
  r=s.open(url)
  p=s.get_current_page()

  #scrapy bit
  elements=p.find(WHATEVER)
  confirmation_code, pid=SCRAPE(elements)
  return [confirmation_code, pid]

codes=pd.DataFrame()

for pid in pids:
  try:
    tmp=getConfirmationCode(pid)
    # Add data to dataframe...
    codes = pd.concat([codes, pd.DataFrame([tmp], columns=['PI','Code'])])
  except:
    print('Failed for {}'.format(pid))

codes

So… yes, the systems don’t join up in the usual workflow, but it’s easy enough to hack together some glue as an end-user developed application: given that the systems are based on quite old-style HTML thinking, they are simple enough to scrape and treat as a de facto roll-your-own API.

Checking the time, it has taken me pretty as much as long as it took to put the above code together as it has taken to write this post and generate the block diagram shown above.

With another hour, I could probably learn enough about the new plotly Dash package (like R/shiny for python?) to create a simple browser-based app UI for it.

Of course, this isn’t enterprise grade for a digital organisation, where everything is form/button/link/click easy, but it’s fine for a scruffy digital org where you appropriate what you need and string’n’glue’n’gaffer tape let you get stuff done (and also prototype, quickly and cheaply, things that may be useful, without spending weeks and months arguing over specs and font styles).

Indeed, it’s the digital equivalent of the workarounds all organisations have, where you know someone or something who can hack a process, or a form, or get you that piece of information you need, using some quirky bit of arcane knowledge, or hidden backchannel, that comes from familiarity with how the system actually works, rather than how people are told it is supposed to work. (I suspect this is not what folk mean when they are talking about a digital first organisation, though?!;-)

And if it’s not useful? Well it didn’t take that much time to try it to see if it would be…

Keep on Tuttling…;-)

PS the blockdiagram above was generated using an online service, blockdiag. Here’s the code (need to check: could I assign labels to a variable and use those to cut down repetition?):

[{
  A [label="Work Allocation"];
  B [label="eTMA System"];
  C [label="Student Record"];
  D [label="Download"];
  DD [label="Confirmation Code"]
  E [label="Student Record"];
  F [label="Download"];
  FF [label="Confirmation Code"]
  G [shape="dots"];
  H [label="Student Record"];
  I [label="Download"];
  II [label="Confirmation Code"];

  OSCAR -> A -> B;

  B -> C -> D;
  C -> DD;

  B -> E -> F;
  E -> FF;
  B -> G;

  B -> H -> I;
  H -> II;
}

Is that being digital? Is that being cloud? Is that being agile (e.g. in terms of supporting maintenance of the figure?)?

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: