Adding Metadata to Google Docs

A couple of months ago I had started working on an export tool that would export a Google doc in the OU-XML format. The rationale? The first couple of drafts of the teaching material that will be delivered through the VLE in the forthcoming (October, 2015) OU Data management and analysis course (TM351) have been prepared in Google docs, and the production process will soon have to move to the Open University’s XML workflow. This workflow is built around an OU defined schema, often referred to as OU-XML (or OUXML), and is supported by a couple of oXygen XML editor extensions that make it easy to preview rendered versions of the documents in a test VLE site.

The schema itself includes several elements that are more akin to metadata elements than actual content – things like the course code, course title, for example, or the byline (or lead author) of a particular unit.

Support for a small amount of metadata is provided by Google Drive, but the only easily customisable element is a free text description element.

gdocsMetadata

So whilst patching a couple of “issues” today with the Google Docs to OU-XML generator, and adding a menu option that allows users to create a zip file in Google Drive that contains the OU-XML and any associated image files for a particular Google doc, I thought it might also be handy to add some support for additional metadata elements. Google Drive apps support a Properties class that allows metadata properties represented as key-value pairs to be associated with a particular document, user or script. Google Apps Script can be used to set and retrieve these properties. In addition, Google Apps Script can be used to generate templated HTML user interface forms that can be used to extend Google docs or spreadsheets functionality.

In particular, I created a handful of Google Apps Script functions to pop up a templated panel, save metadata descriptions entered into the metadata form as document properties, and retrieve the value of a particular metadata element.

//Pop up the metadata edit/display panel
//The document is created as a templated HTML document
function metadataView() {
  // Generate the HTML
  html= HtmlService
      .createTemplateFromFile('metadata')
      .evaluate()
      .setSandboxMode(HtmlService.SandboxMode.IFRAME);
  //Pop up a panel and render the HTML describing the metadata form inside it
  DocumentApp.getUi().showModalDialog(html, 'Metadata');
}

//This function sets the document properties from the metadata form elements
function processMetadataForm(theForm) {
  var props=PropertiesService.getDocumentProperties()
  //Process each form element (atm, they are just input text elements)
  for (var item in theForm) {
    props.setProperty(item,theForm[item])
    Logger.log(item+':::'+theForm[item]);
  }
}

The templated HTML form is configured using a set of desired metadata elements. Each element is described using a label that is displayed in the form, an attribute name which should be a single word) and an optional default value. The template also demonstrates how we can call a server side Apps Script function from the dialogue using the google.script.run.FUNCTION_NAME construction.

<script src="//ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
  
<? 
//Add metadata fields here in the following format:
//[Label, a unique identifier (unique word, no spaces or punctuation), an optional default value]
var metadataItems =[
    ["Lead Author","leadAuthor"],
    ["Course Code","courseCode"],
    ["Course Title","courseTitle"],
    ["Unit Title","unitTitle"],
    ["Rendering","rendering","VLE2 staff (learn3)"]
]
?>
  
<? var metadata = PropertiesService.getDocumentProperties() ?>
<script>
//When the metadata has been successfully saved as document properties
//  close the metadata form panel
function onSave() {google.script.host.close()}
</script>
  
<form id='metadataForm'>
<!-- Construct a set of form elements, one for each metadata item -->
<? for (var i = 0; i < metadataItems.length; i++) { ?>
  <div><?= metadataItems[i][0] ?>: 
    <input type="text"
      name = "<?= metadataItems[i][1] ?>"
      <? val=''
        if (metadataItems[i].length>2) val= metadataItems[i][2]  ?>
      value= "<?= metadata.getProperty(metadataItems[i][1]) ? metadata.getProperty(metadataItems[i][1])  : val  ?>"
    /> 
  </div>
<? } ?>
    
</form>
  
<div>
  <input
    type="button"
    value="Save & Close"
    onclick="google.script.run.withSuccessHandler(onSave).processMetadataForm(document.getElementById('metadataForm'))"
  />
  
  <input
    type="button"
    value="Cancel"
    onclick="google.script.host.close()"
  />
</div>

When the metadataView() function is called from the Add-Ons menu, it pops a dialogue that looks (in unstyled form) something like this:

googleDocMetadata

Metadata elements are loaded in to the form if they exist or a default value is specified.

When generating the export OU-XML, a helper function grabs the value of the relevant metadata element from the document properties. This value then then be inserted into the OU XML at the appropriate point.

//A helper function to display a particular metadata element
//This function is called from the metadata form
function getProp(key) {
  var props= PropertiesService.getDocumentProperties()
  return props.getProperty(key) ? props.getProperty(key) : '';
}

var COURSECODE= getProp('courseCode');

One issue with this approach is that if we have lots of documents relating to different units for the same course, we may need to enter the same values for several metadata elements across each document (for example, the course code and course title). Unfortunately, Google Drive does not support arbitrary properties for folders. One solution, suggested by Tom Smith/@everythingabili was to use the description element for a folder to store JSON represented metadata. I think we could actually simplify that, using a line based representation or a simple delimited representation that we can easily split on, something like:

courseCode :: TM351;;
courseTitle:: Data Management and Analysis

for example. We could then split on ;; to get each pair, strip whitespace, split on :: and strip whitespace again to get the key:value elements for each metadata item.

gdocsfoldermetadata

I guess one way of getting the folder decription given a particular document as a starting point is to find the parent folder using file#getParents() perhaps?) and then call folder#getDescription()?

Another approach might be to have a dummy, canonically named file in each folder (metadata for example), that we add metadata to, and then whenever we open a new file in the folder we look for the metadata file, get its metadata property values, and use those to seed the metadata values for our content document.

Finally, it’s maybe worth also pondering the issue of generating the OU-XML export for all the documents within a given folder? One way to do this might be to create a function off a each document that will find the parent folder, find all the files (except, perhaps, a metadata file?!) in that folder, and then run the OU-XML generator over all of them, bundling them up into a single zip file, perhaps with a directory structure that puts the OU XML for each document, along with any image files associated with it, into separate folders?

Only it probably isn’t.. I suspect that if the migration to the OU-XML format, if it hasn’t already happened, will involve copying and pasting…

PS for completeness, the menu option can be installed as follows:

function onOpen(e) {
  DocumentApp.getUi().createAddonMenu()
      .addItem('Metadata','metadataView')
      .addToUi();
}

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: