An email:
Erm… a Word document with some images and captions – styled as such:
Some basic IT knowledge – at least – it should be basic in what amounts to a publishing house:
The .docx file is just a zip file… That is, a compressed folder and its contents… So use the .zip…
So here’s the unzipped folder listing – can you spot the images?
The XML content of the doc – viewed in Firefox (drag and drop the file into a Firefox browser window). Does anything jump out at you?
Computers can navigate to the tags that contain the caption text by looking for the Caption style. It can be a faff associating the image captions with the images though (you need to keep tallies…) because the Word XML for the figure doesn’t seem to include the filename of the image… (I think you need to count your way through the images, then relate that image index number with the following caption block?)
So re: the email – if authors tag the captions and put captions immediately below an image – THE MACHINE CAN DO IT, if we give someone an hour or two to knock up the script and then probably months and months and months arguing about the workflow.
PS I’d originally screencaptured and directly pasted the images shown the above into a Powerpoint presentation:
I could have recaptured the screenshots, but it was much easier to save the Powerpoint file, change the .pptx suffix to .zip, unzip the folder, browse the unzipped Powerpoint media folder to see which image files I wanted:
and then just upload them directly to WordPress…
See also: Authoring Multiple Docs from a Single IPython Notebook for another process that could be automated but lack of imagination and understanding just blanks out.