Academic Papers: Written to be Read, Referenced, or Questioned?

If you ever get to do any academic skills training, one of the sessions you’ll probably encounter one on reading skills. Having a quick skim of the OU Library website (because it’s often the Library that is involved with skills training), I see a section on Postgraduate study skills: reading efficiently.

Now this may or may not be a long page — obviously, I can’t be bothered to scroll it to see — so I just generate a summary instead…

This advice, and the practice of many academics, suggests that despite the way a paper is presented on a page, it’s not read linearly: you typically top and tail it, skim headings, read table and figure captions, and look at diagrams and their captions. If you’re reference chasing, you mine the bibliography. And so on.

In passing, I hate full page two column papers because I often want to zoom the page (get old — then you’ll understand…), which often then requires scrolling up and down. If a page has a half page diagram, however, the format works for me, because the form factor fits a landscape screen:

Which makes me think a 2-block, 2-column layout (1-2 horizontal, then 3-4 below) would work better for me, maybe with a line between the two blocks as a visual break line cue, that the browser could also use to support tabbed “next block” scroll-to browsing of the document.

[Browser tools: note the use of the Edge browser “split screen” tool to generate the screenshot. (I’m not sure if there is a way to introduce a vertical rather than horizontal split? I used to have a bookmarklet for generating a vertical split on a page to help when taking screenshots. It used to load the same URL into a couple of iframes, one above the other, although some pages had frame busting built in and wouldn’t load into an iframe.]

A couple of days ago I spotted (I forget where…) a link to a service called Talk2Arxiv which lets you prefix an arxiv.org URL with talk2 and it will load a retrieval augmented generation (RAG) based question and answer tool that will let you ask questions over the paper. (At the moment, it doesn’t seem to be able to cite-and-scroll-to or highlight relevant sections in the paper, but that’s an obvious feature to add.)

From a quick play with this, I wonder if there are ways of writing papers that make them easier to summarise or act as the basis of a generative QnA service. (The specific chunking and retrieval strategy used as part of the RAG engine would be a key consideration in this. Which reminds me I need to play some more with my useful chunking strategies for retrieving educational content?.) Thinks: will question’n’answering engine optimisation (QNAO) become a thing…?

At this point, it’s probably also worth running a bit further with the idea of “if a paper isn’t for reading, what is it for?”. One possible way that complements the previous approach of a paper as “a thing can can be used to answer questions” is to see it as a thing that “can be used to answer questions of a particular type”. For example, in Prompting for questions…, I noted that O’Reilly’s textbook server is set to start offering a value-add service that can generate questions from a text that should be capable of being answered by the text.

Relating to this, another set of skills that are often developed in academic settings relate to critical reading. Once again, the OU Library has a related advice page, and once again TLDR (maybe, dunno, didn’t even skim the page, just summarise it for me):

But more generally, for learners, being able to generate questions from the text gives you a generally available “test me” partner, who can take your text book from you at any point and ask you questions based on what it sees. You can then test your own knowledge and check back against the text to see what the text book answers are.

For educators, too. the “generate a question from the text” might be handy in generating, or at least hinting at, possible questions a text might be good for answering. As a corollary, a similar approach might be useful for “asking” a text what learning objectives it might support the delivery of, help check that a particular set of materials does appear to be able to answer a particular question and provide an LLM based opinion as to whether the materials meet the needs of delivering teaching relating to a particular set of learning objectives. (I’m also wondering now how a model might respond to questions that try to relate learning objectives to possible learning outcomes).

PS I note the title mentions references. Sometimes, all you want the paper for is to support a general “see also”, based on a general, related work area, when citing a particular quote, or when providing evidence, by way of provenance, for a claimed fact.

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering... View all posts by Tony Hirst