Notebooks, knitr and the Language-Markdown View Source Option…

One of the foundational principles of the web, though I suspect ever fewer people know it, is that you can “View Source” on a web page to see what bits of HTML, Javascript and CSS are used to create it.

In the WordPress editor I’m currently writing in, I’m using a Text view that lets me write vanilla HTML; but there is also a WYSIWYG (what you see is what you get) view that shows how the interpreted HTML text will look when it is rendered in the browser as a web page.

viewtext

Reflecting on IPython Markdown Opportunities in IPython Notebooks and Rstudio, it struck me that the Rmd (Rmarkdown) view used in RStudio, the HTML preview of “executed” Rmd documents generated from Rmd by knitr and the interactive Jupyter (IPython, as was) notebook view can be seen as standing in this sort of relation to each other:

rmd-wysiwyg

From that, it’s not too hard to imagine RStudio offering the following sort of RStudio/IPython notebook hybrid interface – with an Rmd “text” view, and with a notebook “visual” view (eg via an R notebook kernel):

viewrmd

And from both, we can generate the static HTML preview view.

In terms of underlying machinery, I guess we could have something like this:

rmdviewarch

I’m looking forward to it:-)

Google Gets Out of Personal Control?

This is a rant… It may or may not be coherent… it’s just me venting and trolling myself…

Earlier today I posted a selection of F1 battlemaps in a post on the F1DataJunkie blog, which is hosted on Blogger: F1 Canada 2015 Battlemaps – How the Race Happened from the Drivers’ Perspective. The charts were uploaded to the blog, which in turn means that they they’re stored on Google photos or whatever the service is called.

Being in a Blogger – and hence Google – context, a Google+ (or Google Accounts or whatever we’re supposed to call it now) profile button was present in the top right hand corner of the screen. It alerted me to some activity, and even though I generally avoid Google Plus, I think Blogger autoposts there, so I clicked through.

blogger_autoawesome

It seems that Google had created an animated gif (an “auto-awesome” picture) out of the images that were contained in the blog post and “added” it somewhere (?) for me.

In this case, the animation is a pure nonsense.

I don’t recall every having opted in to this content-creation-on-my-behalf, and I’m not really interested in Google taking my stuff and mucking about with it. (I know it does this when it resizes images, for example, but in that case, it doesn’t change the content. And I know it does who knows what with my data, and any content that goes any near any of its storage services so it can “better” “personalise” thing for me (as well as presumably using that content and context in a whole range of learning and training algorithms).)

Anyway – as to auto-awesome – I think this is how to disable it?

Settings_-_Google_Photos

PS I don’t remember offhand how I’ve licensed the content on the F1DataJunkie blog (did I get round to CC-BYing it?), but whatever the copyright status, I assume that by my agreeing to my uploaded Blogger images being stored on Google Photos, I grant Google a license to do whatever the f**k it wants with them, if only for my own access and amusement, and then go on to grab at my attention to tell me?

PPS In passing, in response to an an iOS update, I tweeted: itunes update on ios. 37 pages of terms and conditions. Thirty Seven. God only knows what terms and conditions I “agreed” to. But presumably, given that I

PPPS see also Mia Ridge on The rise of interpolated content?.

IPython Markdown Opportunities in IPython Notebooks and Rstudio

One of the reasons I started working on the Wrangling F1 Data With R book was to see what the Rmd (RMarkdown) workflow was like. Rmd allows you to combine markdown and R code in the same document, as well as executing the code blocks and then displaying the results of that code execution inline in the output document.

rmd_demo

As well as rendering to HTML, we can generate markdown (md is actually produced as the interim step to HTML creation), PDF output documents, etc etc.

One thing I’d love to be able to do in the RStudio/RMarkdown environment is include – and execute – Python code. Does a web search to see what Python support there is in R… Ah, it seems it does it already… (how did I miss that?!)

knitr_py

ADDED: Unfortunately, it seems as if Python state is not persisted between separate python chunks – instead, each chunk is run as a one off python inline python command. However, it seems as if there could be a way round this, which is to use a persistent IPython session; and the knitron package looks like just the thing for supporting that.

So that means in RStudio, I could use knitr and Rmd to write a version of Wrangling F1 Data With RPython

Of course, it would be nicer if I could write such a book in an everyday python environment – such as in an IPython notebook – that could also execute R code (just to be fair;-)

I know that we can already use cell magic to run R in a IPython notebook:

ipynb_rmagic

…so that’s that part of the equation.

And the notebooks do already allow us to mix markdown cells and code blocks/output. The default notebook presentation style is to show the code cells with the numbered In []: and Out []: block numbering, but it presumably only takes a small style extension or customisation to suppress that? And another small extension to add the ability to hide a code cell and just display the output?

So what is it that (to my mind at least) makes RStudio a nicer writing environment? One reason is the ability to write the Rmarkdown simply as Rmarkdown in a simple text editor enviroment. Another is the ability to inline R code and display its output in-place.

Taking that second point first, the ability to do better inlining in IPython notebooks – it looks like this is just what the python-markdown extension seems to do:

python_markdown

But how about the ability to write some sort of pythonMarkdown and then open in a notebook? Something like ipymd, perhaps…?

rossant_ipymd

What this seems to do is allow you to open an IPython-markdown document as an IPython notebook (in other words, it replaces the ipynb JSON document with an ipymd markdown document…). To support the document creation aspects better, we just need an exporter that removes the code block numbering and trivially allows code cells to be marked as hidden.

Now I wonder… what would it take to be able to open an Rmd document as an IPython notebook? Presumably just the ability to detect the code language, and then import the necessary magics to handle its execution? It’d be nice if it could cope with inline code, e.g. using the python-markdown magic too?

Exciting times could be ahead:-)

Capital, Labour and Value… I Really Don’t Understand These Terms at All…

For most of my life, I’ve managed to avoid reading much, if anything, about political theory. I have to admit I struggle reading anything from a Marxist perspective because I don’t understand what any of the words mean (I’m not convinced I even know how to pronounce some of them…), or how the logic works that tries to play them off against each other.

The closest I do get to reading political books tend to be more related to organisational theories – things like Parkinson’s Law, for example…;-)

So at a second attempt, I’ve started reading David Graeber’s “The Utopia of Rules”. Such is my level of political naivety, I can’t tell whether it’s a rant, a critique, a satire, or a nonsense.

But if nothing else it does start to introduce words in a way that gives me a jumping off point to try to make my own sense out of them. So for example, on page 37, we have a quote claimed to be from Abraham Lincoln (whether the Abraham Lincoln, or another, possibly made up one, I have no idea – I didn’t follow the footnote to check!):

Labor is prior to, and independent of, capital. Capital is only the fruit of labor, and could never have existed if labor had not first existed. labor is the superior of capital, and deserves much the higher consideration.

This followed on from the observation that “[m]ost Americans, for instance, used to subscribe to a rough-and-ready version of the labor theory of value.” Here’s my rough and ready understanding of that, in part generated as a riffed response to the Lincoln quote, as a picture:

myLabourTheoryOfValue

The abstract thing created by labour is value. The abstract thing that capital is exchanged for is value. That capital (a fiction) can create more capital through loans of capital in exchange for capital+interest repayments suggests that the value capital creates – value that corresponds to interest on capital loaned – is a fiction created from a fiction. It only becomes made real when the actor needing repay the additional fiction must acquire it somehow through their own labour, though in some situations it will also be satiated through the creation of capital-interest, that is, through the creations of other fictions.

Such is the state of my political education!

PS here some other lines I’ve particularly liked so far: from p32: “The bureaucratisation of daily life means the imposition of impersonal rules and regulations; impersonal rules and regulations, in turn, can only operate if they are backed up by the threat of force.” Which follows from p.31: “Whenever someone starts talking about the ‘free market’, it’s a good idea to look around for the man with the gun. He’s never far away.”

And on international trade (p30): “(Much of what was being called ‘international trade’ in fact consisted merely of the transfer of materials back and forth between different branches of the same corporation.)”

Rethinking the TM351 Virtual Machine Again, Again…

It’s getting to that time when we need to freeze the virtual machine build we’re going to use for the new (postponed) data course, which should hopefully go live to students in February, 2016, and I’ve been having a rethink about how to put it together.

The story so far has been documented in several blog posts and charts my learning journey from knowing nothing about virtual machines (not sure why I was given the task of putting it together?!) to knowing how little I know about building Linux administration, PostgreSQL, MongoDB, Linux networking, virtual machines and virtualisation (which is to say, knowing I don’t know enough to do any of this stuff properly…;-)

The original plan was to put everything into a single VM and wire all the bits together. One of the activities needed to fire up several containers as part of a mongo replica set, and I opted to use containers to do that.

Over the last few months, I started to wonder whether we should containerise everything separately, then deploy compositions of containers. The rationale behind this approach is that it means we could make use of a single VM to host applications for several users if we get as far as cloud hosting services/applications for out students. It also means students can start, stop or “reinstall” particular applications in isolation from the other VM applications they’re running.

I think I’ve got this working in part now, though it’s still very much tied to the single user – I’m doing things with permissions that would never be allowed (and that would possibly break things..) if we were running multiple users in the same VM.

So what’s the solution? I posted the first hints in Kiteflying Around Containers – A Better Alternative to Course VMs? where I proved to myself I could fire up an IPthyon notebook server on top of scientific distribution stack, and get the notebooks talking to a DBMS running in another container. (This was point and click easy, once you know what to click and what numbers to put where.)

The next step was to see if I could automate this in some way. As Kitematic is still short of a Windows client, and doesn’t (yet?) support Docker Compose, I thought I’d stick with vagrant (which I was using to build the original VM using a Puppet provision and puppet scripts for each app) and see if I could get it provision a VM to run containerised apps using docker. There are still a few bits to do – most notably trying to get the original dockerised mongodb stuff working, checking the mongo link works, working out where to try to persist the DBMS data files (possibly in a shared folder on host?) in a way that doesn’t trash them each time a DBMS container is started, and probably a load of other stuff – but the initial baby steps seem promising…

In the original VM, I wanted to expose a terminal through the browser, which meant pfaffing around with tty.js and node.js. The latest Jupyter server includes the ability to launch a browser based shell client, which meant I could get rid of tty.js. However, moving the IPython notebook into a container means that the terminal presumably has scope only within that container, rather than having access to the base VM command line? For various reasons, I intend to run the IPython/Jupyter notebook server container as a privileged container, which means it can reach outside the container (I think? The reason? eg to fire up containers for the mongo replica set activity) but I’m not sure if this applies to the command line/terminal app too? Though offhand, I can’t think why we might want to provide students with access to the base VM command line?

Anyway, the local set-up looks like this…

A simple Vagrantfile, called using vagrant up or vagrant reload. I have extended vagrant using the vagrant-docker-compose plugin that supports Docker Compose (fig, as was) and lets me fired up wired-together container configurations from a single script:

# -*- mode: ruby -*-
# vi: set ft=ruby :

Vagrant.configure("2") do |config|
  config.vm.box = "ubuntu/trusty64"

  config.vm.network(:forwarded_port, guest: 9000, host: 9000)
  config.vm.network(:forwarded_port, guest: 8888, host: 8351,auto_correct: true)

  config.vm.provision :docker
  config.vm.provision :docker_compose, yml: "/vagrant/docker-compose.yml", rebuild: true, run: "always"
end

The YAML file identifies the containers I want to run and the composition rules between them:

ui:
  image: dockerui/dockerui
  ports:
    - "9000:9000"
  volumes:
    - /var/run/docker.sock:/var/run/docker.sock
  privileged: true

ipynb:
  build: ./tm351_scipystacknserver
  ports:
    - "8888:8888"
  volumes:
    - ./notebooks/:/notebooks/
  links:
    - devpostgres:postgres
  privileged: true
    
devpostgresdata:
    command: echo created
    image: busybox
    volumes: 
        - /var/lib/postgresql/data
 
devpostgres:
    environment:
        - POSTGRES_PASSWORD=whatever
    image: postgres
    ports:
        - "5432:5432"
    volumes_from:
        - devpostgresdata

At the moment, Mongo is still missing and I haven’t properly worked out what to do with the PostgreSQL datastore – the idea is that students will be given a pre-populated, pre-indexed database, in part at least.

One additional component that sort of replaces the command line/terminal app requirement from the original VM is the dockerui app. This runs in its own container with privileged access to the docker environment and that provides a simple control panel over all the containers:

DockerUI

What else? The notebook stuff has a shared notebooks directory with host, and is built locally (from a Dockerfile in the local tm351_scipystacknserver directory) on top of the ipython/scipystack image; extensions include some additional package installations (requiring both apt-get and pip installs) and copying across and running a custom IPython notebook template configuration.

FROM ipython/scipystack

MAINTAINER OU

ADD build_tm351_stack.sh /tmp/build_tm351_stack.sh
RUN bash /tmp/build_tm351_stack.sh


ADD ipynb_style /tmp/ipynb_style
ADD ipynb_custom.sh /tmp/ipynb_custom.sh
RUN bash /tmp/ipynb_custom.sh


## Extremely basic test of install
RUN python2 -c "import psycopg2, sqlalchemy"
RUN python3 -c "import psycopg2, sqlalchemy"

# Clean up from build
RUN rm -f /tmp/build_tm351_stack.sh
RUN rm -f /tmp/ipynb_custom.sh
RUN rm -f -r /tmp/ipynb_style

VOLUME /notebooks
WORKDIR /notebooks

EXPOSE 8888

ADD notebook.sh /
RUN chmod u+x /notebook.sh

CMD ["/notebook.sh"]

demo-tm351

If we need to extend the PostgreSQL build, that can be presumably done using a Dockerfile that pulls in the core image and then runs an additional configuration script over it?

So where am I at? No f****g idea. I thought that between the data course and the new web apps course we might be able to explore some interesting models of using virtual machines (originally) and containers (more recently) in a distance education setting, that could cope with single user home use, computer training room/lab use, cloud use, but, as ever, I have spectacularly failed to demonstrate any sort of “academic leadership” in developing these ideas within the OU, or even getting much of a conversation going in the first place. Not in my skill set, I guess!;-) Though perhaps not in the institution’s interests either. Recamp. Retrench. Lockdown. As per some of the sentiments in Reflections on the Closure of Yahoo Pipes, perhaps? Don’t Play Here.

Reflections on the Closure of Yahoo Pipes

Last night I popped up a quick post relaying the announcement of impending closure of Yahoo Pipes, recalling my first post on Yahoo Pipes, and rediscovering a manifesto I put together around the rallying cry We Ignore RSS at OUr Peril.

When Yahoo Pipes first came out, the web was full of the spirit of Web2.0 mashup goodness. At the time, the big web companies were opening all all manner of “open” web APIs – Amazon, Google, and perhaps more than any other, Yahoo – with Google and Yahoo particularly seeming to invest in developer evangelism events.

One of the reasons I became sos evangelical about Yahoo Pipes, particularly in working with library communities, was that it enabled non-coders to engage in programming the web. And more than that. It allowed non-coders to use web based programming tools to build out additional functionality for the web.

looking back, it seems to me now that the whole mashup thing arose from the idea of the web as a creative medium, and one which the core developers (the coders) were keen to make accessible to a wider community. Folk wanted to share, and folk wanted other folk to build on their services in interoperation with other services. It was an optimistic time for the tinkerers among us.

The web companies produced APIs that did useful things, used simple, standard representations (RSS, and then Atom, as simple protocols for communicating lists of content items, for example, then, later, JSON as a friendlier, more lightweight alternative to scary XML, which also reduced the need for casual web tinkerers to try to make sense of XMLHttpRequests), and seemed happy enough to support interoperability.

When Yahoo Pipes came online (and for a brief time, Microsoft’s Popfly mashup tool), the graphical drag-and-drop, wire it together, flow based programming model allowed non-coders to start trying developing, publishing, sharing and building on top of each others real web applications. You could inspect the internals of other peoples pipes, and clone those pipes so you could extend or modify them yourself, and put pipes inside pipes, fostering reuse and the notion of building stuff on top of and out of stuff you’ve learned how to do do before.

And it all seemed so hopeful…

And then the web companies started locking things down a bit more. First my Amazon Pipes started to break, and then my Twitter Pipes, as authentication was introduced to access the feeds published by those companies. It started to seem as if those companies didn’t want their content flows rewired, reflowed and repurposed. And so Yahoo Pipes started to become less useful to me. And a little bit of the spirit of a web as a place where the web companies allowed whosoever, coders and non-coders alike, to build a better web using their stuff started to die.

And perhaps with it, the openness and engagement of the core web developers – the coders – started to close off a little too. True, there are repeated initiatives about learning to code, but whilst I’ve fallen into that camp myself over the last few years, and especially over the last two years, having discovered IPython notebooks and the notion of coding, one line at a time, I think we are complicit in closing off opportunities that help people build out the web using bits of the web.

Perhaps the web is too complicated now. Perhaps the vested interests are too vested. Perhaps the barrage of content of and peck, peck, click, click, Like, addiction feeding, pigeon rat, behaviourist conditioning, screen based crack-Like business model has blinded us to the idea that we can use the web to build our own useful tools.

(I also posted yesterday about a planning application map I helped my local hyperlocal – OnTheWight – publish yesterday. If The Isle of Wight Council published current applications as an RSS feed, it would have been trivial to use the Yahoo Pipes to construct the map. It would have been a five minute hack. As it is, the process we used required building a scraper (in code) and hacking a some code to generate the map.)

There still are tools out there that help you build stuff on the web for the web. CartoDB makes map creation relatively straightforward, and things like Mozilla Popcorn allow you to build your own apps around content containers (I think? It’s been a long time since I looked at it).

Taking time out to reflect on this, it seems as if the web cos have become too inward looking. Rather than engaging wider communities to engage in building out the web, the companies get to a size where their systems become ever more complex, yet have to maintain their own coherence, and a cell wall goes up to contain that activity, and authentication starts to be used to limit access further.

At the time as the data flows become more controlled, the only way to access them comes through code. Non-coders are disenfranchised and the lightweight, open protocols that non-coding programming tools can work most effectively with become harder to justify.

When Pipes first appeared, it seemed as if the geeks were interested in building tools that increased opportunities to engage in programming the web, using the web.

And now we have Facebook. Tap, tap, peck, peck, click, click, Like. Ooh shiny… Tap, tap, peck, peck…

Yahoo Pipes Retires…

And so it seem that Yahoo Pipes, a tool I first noted here (February 08, 2007), something I created lots of recipes for (see also on the original, archived OUseful site), ran many a workshop around (and even started exploring a simple recipe book around) is to be retired (end of life annoucement)…

Pipes__Rewire_the_web

It’s not completely unexpected – I stopped using Pipes much at all several years ago, as sites that started making content available via RSS and Atom feeds then started locking it down behind simple authentication, and then OAuth…

I guess I also started to realise that the world I once imagine, as for example in my feed manifesto, We Ignore RSS at OUr Peril, wasn’t going to play out like that…

However, if you still believe in pipe dreams, all is not lost… Several years ago, Greg Gaughan took up the challenge of producing a Python library that could take a Yahoo Pipe JSON definition file and execute the pipe. Looking at the pipe2py project on github just now, it seems the project is still being maintained, so if you’re wondering what to do with your pipes, that may be worth a look…

By the by, the last time I thought Pipes might not be long for this world, I posted a couple of posts that explored how it might be possible to bulk export a set of pipe definitions as well as compiling and running your exported Yahoo Pipes.

Hmmm… thinks… it shouldn’t be too hard to get pipe2py running in a docker container, should it…?

PS I don’t think pipe2py has a graphical front end, but javascript toolkits like jsPlumb look like they may do much of the job. (It would be nice if the Yahoo Pipes team could release the Pipes UI code, of course…;-)

PPS if you you need a simple one step feed re-router, there’s always IFTT. If realtime feed/stream processing apps are more your thing, here are a couple of alternatives that I keep meaning to explore, but never seem to get round to… Node-RED, a node.js thing (from IBM?) for doing internet-of-things based inspired stream (I did intend to play with it once, but I couldn’t even figure out how to stream the data I had in…); and Streamtools (about), from The New York Times R&D Lab, that I think does something similar?