Category: Tinkering

Using Google to Look Up Where You Live via the Physical Location of Your Wifi Router

During a course team meeting today, I idly mentioned that we should be able to run a simple browser based activity involving the geolocation of a student’s computer based on Google knowing the location of their wifi router. I was challenged about the possibility of this, so I did a quick bit of searching to see if there was an easy way of looking up the MAC addresses (BSSID) of wifi access points that were in range, but not connected to:

show_wifi_access_point_mac_address_chrome_os_x_-_Google_Search

which turned up:

The airport command with '-s' or '-I' options is useful: /System/Library/PrivateFrameworks/Apple80211.framework/Resources/airport

airport-mac

(On Windows, the equivalent is maybe something like netsh wlan show network mode=bssid ???)

The second part of the jigsaw was to try to find a way of looking up a location from a wifi access point MAC address – it seems that the Google geolocation API does that out of the can:

The_Google_Maps_Geolocation_API_ _ _Google_Maps_Geolocation_API_ _ _Google_Developers_and_Add_New_Post_‹_OUseful_Info__the_blog____—_WordPress

An example of how to make a call is also provided, as long as you have an API key… So I got a key and gave it a go:

wifi-post

:-)

Looking at the structure of the example Google calls, you can enter several wifi MAC addresses, along with signal strength, and the API will presumably triangulate based on that information to give a more precise location.

The geolocation API also finds locations from cell tower IDs.

So back to the idea of a simple student activity to sniff out the MAC addresses of wifi routers their computer can see from the workplace or home, and then look up the location using the Google geolocation API and pop it on a map.

Which is actually the sort of thing your browser will do when you turn on geolocation services:

Mozilla_Firefox_Web_Browser_—_Geolocation_in_Firefox_—_Mozilla

But maybe when you run the commands yourself, it feels a little bit more creepy?

PS sort of very loosely related, eg in terms of trying to map spaces from signals in the surrounding aether, a technique for trying to map the insides of a room based on it’s audio signature in response to a click of the fingers: http://www.r-bloggers.com/intro-to-sound-analysis-with-r/

IBM DataScientistWorkBench = OpenRefine + RStudio + Jupyter Notebooks in the Cloud, Via Your Browser

One of the many things on my “to do” list is to put together a blogged script that wires together RStudio, Jupyter notebook server, Shiny server, OpenRefine, PostgreSQL and MongDB containers, and perhaps data extraction services like Apache Tika or Tabula and a few OpenRefine style reconciliation services, along with a common shared data container, so the whole lot can be launched on Digital Ocean at a single click to provide a data wrangling playspace with all sorts of application goodness to hand.

(Actually, I think I had a script that was more or less there for chunks of that when I was looking at a docker solution for the databases courses, but that fell by the way side and I suspect the the Jupyter container (IPython notebook server, as was), probably needs a fair bit of updating by now. And I’ve no time or mental energy to look at it right now…:-(

Anyway, the IBM Data Scientist Workbench now sits alongside things like KMis longstanding KMi Crunch Learning Analytics Environment (RStudio + MySQL), and the Australian ResBaz Cloud – Containerised Research Apps Service in my list of why the heck can’t we get our act together to offer this sort of SaaS thing to learners? And yes I know there are cost applications…. but, erm, sponsorship, cough… get-started tokens then PAYG, cough…

It currently offers access to personal persistent storage and the ability to launch OpenRefine, RStudio and Jupyter notebooks:

Data_Scientist_Workbench

The toolbar also suggest that the ability to “discover” pre-identified data sources and run pre-configured modeling tools is also on the cards.

The applications themselves run off a subdomain tied to your account – and of course, they’re all available through the browser…

RStudio_and_OpenRefine_and_Data_Scientist_Workbench_and_Edit_Post_‹_OUseful_Info__the_blog____—_WordPress

So what’s next? I’d quite like to see ‘data import packs’ that would allow me to easily pull in data from particular sources, such as the CDRC, and quickly get started working with the data. (And again: yes, I know, I could start doing that anyway… maybe when I get round to actually doing something with isleofdata.com ?!;-)

See also these recipes for running app containers on Digital Ocean via Tutum: RStudio, Shiny server, OpenRefine and OpenRefine reconciliation services, and these Seven Ways of Running IPython / Jupyter Notebooks.

Grabbing Screenshots of folium Produced Choropleth Leaflet Maps from Python Code Using Selenium

I had a quick play with the latest updates to the folium python package today, generating a few choropleth maps around some of today’s Gov.UK data releases.

The problem I had was that folium generates an interactive Leaflet map as an HTML5 document (eg something like an interactive Google map), but I wanted a static image of it – a png file. So here’s a quick recipe showing how I did that, using a python function to automatically capture a screengrab of the map…

First up, a simple function to get a rough centre for the extent of some boundaries in a geoJSON boundaryfile containing the boundaries for LSOAs in the Isle of Wight:

#GeoJSON from https://github.com/martinjc/UK-GeoJson
import json
import fiona
fi=fiona.open(GEO_JSON)
fi.bounds

centre_lon, centre_lat=((bounds[0]+bounds[2])/2,(bounds[1]+bounds[3])/2)

Now we can get some data – I’m going to use the average travel time to a GP from today’s Journey times to key services by lower super output area data release and limit it to the Isle of Wight data.

import pandas as pd

gp='https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/485260/jts0505.xls'
xl=pd.ExcelFile(gp)
#xl.sheet_names

tname='JTS0505'
dbd=xl.parse(tname,skiprows=6)

iw=dbd[dbd['LA_Code']=='E06000046']

The next thing to do is generate the map – folium makes this quite easy to do: all I need to do is point to the geoJSON file (geo_path), declare where to find the labels I’m using to identify each shape in that file (key_on), include my pandas dataframe (data), and state which columns include the shape/area identifiers and the values I want to visualise (columns=[ID_COL, VAL_COL]).

import folium
m = folium.Map([centre_lat,centre_lon], zoom_start=11)

m.choropleth(
    geo_path='../IWgeodata/lsoa_by_lad/E06000046.json',
    data=iw,
    columns=['LSOA_code', 'GPPTt'],
    key_on='feature.properties.LSOA11CD',
    fill_color='PuBuGn', fill_opacity=1.0
    )
m

The map object is included in the variable m. If I save the map file, I can then use the selenium testing package to open a browser window that displays the map, generate a screen grab of it and save the image, and then close the browser. Note that I found I had to add in a slight delay because the map tiles occasionally took some time to load.

#!pip install selenium

import os
import time
from selenium import webdriver

delay=5

#Save the map as an HTML file
fn='testmap.html'
tmpurl='file://{path}/{mapfile}'.format(path=os.getcwd(),mapfile=fn)
m.save(fn)

#Open a browser window...
browser = webdriver.Firefox()
#..that displays the map...
browser.get(tmpurl)
#Give the map tiles some time to load
time.sleep(delay)
#Grab the screenshot
browser.save_screenshot('map.png')
#Close the browser
browser.quit()

Here’s the image of the map that was captured:

map

I can now upload the image to WordPress and include it in an automatically produced blog post:-)

PS before I hit on the Selenium route, I dabbled with a less useful, but perhaps still handy library for taking screenshots: pyscreenshot.

#!pip install pyscreenshot
import pyscreenshot as ImageGrab

im=ImageGrab.grab(bbox=(157,200,1154,800)) # X1,Y1,X2,Y2
#To grab the whole screen, omit the bbox parameter

#im.show()
im.save('screenAreGrab.png',format='png')

The downside was I had to find the co-ordinates of the area of the screen I wanted to grab by hand, which I couldn’t find a way of automating… Still, could be handy…

Digital Text Forensics

After having a quick play at building a script for Finding Common Phrases or Sentences Across Different Documents, I thought I should spend a few minutes trying to find how this problem is more formally identified, described and attacked…

…and in doing so came across a rather interesting looking network on digital text forensics: PAN. (By the by, see also: EuSpRiG – European Spreadsheet Risks Interest Group.)

PAN run a series of challenges on originality which relate directly to the task I was looking at, with two in particular jumping out at me:

  • External Plagiarism Detection: given a document and a set of documents, determine whether parts of the former have been reused from the latter (2009-2011)
  • Text Alignment: given a pair of documents, extract all pairs of reused passages of maximal length (2012-2015)

The challenges come with training and test cases, as well as performance metrics are described for each year – here’s the 2015 Plagiarism Detection / Text Reuse Detection Task Definitions.

Maybe I should give them a code to see how my code stacks up against the default benchmark… Hmmm…If I do worse, I find a better solution; if I do better, my code wasn’t that bad; if I do the same, at least I got the starting point right… Just no time to do it right now…;-)

PS Thinks: what would be really useful would be a git repository somewhere with code examples for all the challenge entries…;-)

Finding Common Phrases or Sentences Across Different Documents

As mentioned in the previous post, I picked up on a nice little challenge from my colleague Ray Corrigan a couple days ago to find common sentences across different documents.

My first, rather naive, thought was to segment each of the docs into sentences and then compare sentences using a variety of fuzzy matching techniques, retaining the ones that sort-of matched. That approach was a bit ropey (I’ll describe it in another post), but whilst pondering it over a dog walk a much neater idea suggested itself – compare n-grams of various lengths over the two documents. At it’s heart, all we need to do is find the intersection of the ngrams that occur in each document.

So here’s a recipe to do that…

First, we need to get documents into a text form. I started off with PDF docs, but it was easy enough to extract the text using textract.

!pip install textract

import textract
txt = textract.process('ukpga_19840012_en.pdf')

The next step is to compare docs for a particular size n-gram – the following bit of code finds the common ngrams of a particular size and returns them as a list:

import nltk
from nltk.util import ngrams as nltk_ngrams

def common_ngram_txt(tokens1,tokens2,size=15):
    print('Checking ngram length {}'.format(size))
    ng1=set(nltk_ngrams(tokens1, size))
    ng2=set(nltk_ngrams(tokens2, size))

    match=set.intersection(ng1,ng2)
    print('..found {}'.format(len(match)))

    return match

I want to be able to find common ngrams of various lengths, so I started to put together the first fumblings of an n-gram sweeper.

The core idea was really simple – starting with the largest common n-gram, detect increasingly smaller n-grams; then do a concordance report on each of the common ngrams to show how that ngram appeared in the context of each document. (See n-gram / Multi-Word / Phrase Based Concordances in NLTK.)

Rather than generate lots of redundant reports – if I detected the common 3gram “quick brown fox”, I would also find the common ngrams “quick brown” and “brown fox” – I started off with the following heuristic: if a common n-gram is part of a longer common n-gram, ignore it. But this immediately turned up a problem. Consider the following case:

Document 1: the quick brown fox
Document 2: the quick brown fox and the quick brown cat and the quick brown dog

Here, there is a common 4-tuple:the quick brown fox. There is also a common 3-tuple: the quick brown, which a concordance plot would reveal as being found in the context of a cat and a dog as well as a fox. What I really need to do is keep a copy of common n-gram locations that are not contained within the context of a longer n-gram context in the second document, but drop copies of locations where it is subsumed in an already found longer ngram.

Indexing on token number within the second doc, I need to return something like this:

([('the', 'quick', 'brown', 'fox'),
  ('the', 'quick', 'brown'),
  ('the', 'quick', 'brown')],
 [[0, 3], [10, 12], [5, 7]]

which shows up the shorter common ngrams only in places where it is not part of the longer common ngram.

In the following, n_concordance_offset() finds the location of a phrase token list within a document token list. The ngram_sweep_txt() scans down a range of ngram lengths, starting with the longest, trying to identify locations that are not contained with an already discovered longer ngram.

def n_concordance_offset(text,phraseList):
    c = nltk.ConcordanceIndex(text.tokens, key = lambda s: s.lower())
    
    #Find the offset for each token in the phrase
    offsets=[c.offsets(x) for x in phraseList]
    offsets_norm=[]
    #For each token in the phraselist, find the offsets and rebase them to the start of the phrase
    for i in range(len(phraseList)):
        offsets_norm.append([x-i for x in offsets[i]])
    #We have found the offset of a phrase if the rebased values intersect
    #via http://stackoverflow.com/a/3852792/454773
    intersects=set(offsets_norm[0]).intersection(*offsets_norm[1:])
    
    return intersects
    
def ngram_sweep_txt(txt1,txt2,ngram_min=8,ngram_max=50):    
    tokens1 = nltk.word_tokenize(txt1)
    tokens2 = nltk.word_tokenize(txt2)

    text1 = nltk.Text( tokens1 )
    text2 = nltk.Text( tokens2 )

    ngrams=[]
    strings=[]
    ranges=[]
    for i in range(ngram_max,ngram_min-1,-1):
        #Find long ngrams first
        newsweep=common_ngram_txt(tokens1,tokens2,size=i)
        for m in newsweep:
            localoffsets=n_concordance_offset(text2,m)

            #We need to avoid the problem of masking shorter ngrams by already found longer ones
            #eg if there is a common 3gram in a doc2 4gram, but the 4gram is not in doc1
            #so we need to see if the current ngram is contained within the doc index of longer ones already found
            
            for o in localoffsets:
                fnd=False
                for r in ranges:
                    if o>=r[0] and o<=r[1]:
                        fnd=True
                if not fnd:
                    ranges.append([o,o+i-1])
                    ngrams.append(m)
    return ngrams,ranges,txt1,txt2

def ngram_sweep(fn1,fn2,ngram_min=8,ngram_max=50):
    txt1 = textract.process(fn1).decode('utf8')
    txt2 = textract.process(fn2).decode('utf8')
    ngrams,ranges,txt1,txt2=ngram_sweep_txt(txt1,txt2,ngram_min=ngram_min,ngram_max=ngram_max)
    return ngrams,ranges,txt1,txt2

What I really need to do is automatically detect the largest n-gram and work back from there, perhaps using a binary search starting with an n-gram the size of the number of tokens in the shortest doc… But that’s for another day…

Having discovered common phrases, we need to report them. The following n_concordance() function, (based on this) does just that; the concordance_reporter() function manages the outputs.

import textract

def n_concordance(txt,phrase,left_margin=5,right_margin=5):
    #via https://simplypython.wordpress.com/2014/03/14/saving-output-of-nltk-text-concordance/
    tokens = nltk.word_tokenize(txt)
    text = nltk.Text(tokens)

    phraseList=nltk.word_tokenize(phrase)

    intersects= n_concordance_offset(text,phraseList)
    
    concordance_txt = ([text.tokens[map(lambda x: x-left_margin if (x-left_margin)>0 else 0,[offset])[0]:offset+len(phraseList)+right_margin]
                        for offset in intersects])
                         
    outputs=[''.join([x+' ' for x in con_sub]) for con_sub in concordance_txt]
    return outputs

def concordance_reporter(fn1='Draft_Equipment_Interference_Code_of_Practice.pdf',
                         fn2='ukpga_19940013_en.pdf',fo='test.txt',ngram_min=10,ngram_max=15,
                         left_margin=5,right_margin=5,n=5):
    
    fo=fn2.replace('.pdf','_ngram_rep{}.txt'.format(n))
    
    f=open(fo, 'w+')
    f.close()
     
    print('Handling {}'.format(fo))
    ngrams,strings, txt1,txt2=ngram_sweep(fn1,fn2,ngram_min,ngram_max)
    #Remove any redundancy in the ngrams...
    ngrams=set(ngrams)
    with open(fo, 'a') as outfile:
        outfile.write('REPORT FOR ({} and {}\n\n'.format(fn1,fn2))
        print('found {} ngrams in that range...'.format(len(ngrams)))
        for m in ngrams:
            mt=' '.join(m)
            outfile.write('\n\-------\n{}\n\n'.format(mt.encode('utf8')))
            for c in n_concordance(txt1,mt,left_margin,right_margin):
                outfile.write('<<<<<{}\n\n'.format(c.encode('utf8')))
            for c in n_concordance(txt2,mt,left_margin,right_margin):
                outfile.write('>>>>>{}\n\n'.format(c.encode('utf8')))
    return

Finally, the following function makes it easier to compare a document of interest with several other documents:

for f in ['Draft_Investigatory_Powers_Bill.pdf','ukpga_19840012_en.pdf',
          'ukpga_19940013_en.pdf','ukpga_19970050_en.pdf','ukpga_20000023_en.pdf']:
    concordance_reporter(fn2=f,ngram_min=10,ngram_max=40,left_margin=15,right_margin=15)

Here’s an example of the sort of report it produces:

REPORT FOR (Draft_Equipment_Interference_Code_of_Practice.pdf and ukpga_19970050_en.pdf


\-------
concerning an individual ( whether living or dead ) who can be identified from it

>>>>>personal information is information held in confidence concerning an individual ( whether living or dead ) who can be identified from it , and the material in question relates 

<<<<<section `` personal information '' means information concerning an individual ( whether living or dead ) who can be identified from it and relating—- ( a ) to his 


\-------
satisfied that the action authorised by it is no longer necessary .

>>>>>must cancel a warrant if he is satisfied that the action authorised by it is no longer necessary . 4.13 The person who made the application 

<<<<<an authorisation given in his absence if satisfied that the action authorised by it is no longer necessary . ( 6 ) If the authorising officer 

<<<<<cancel an authorisation given by him if satisfied that the action authorised by it is no longer necessary . ( 5 ) An authorising officer shall 


\-------
involves the use of violence , results in substantial financial gain or is conduct by a large number of persons in pursuit of a common purpose

>>>>>one or more offences and : It involves the use of violence , results in substantial financial gain or is conduct by a large number of persons in pursuit of a common purpose ; or a person aged twenty-one or 

<<<<<if , — ( a ) it involves the use of violence , results in substantial financial gain or is conduct by a large number of persons in pursuit of a common purpose , or ( b ) the offence 


\-------
to an express or implied undertaking to hold it in confidence

>>>>>in confidence if it is held subject to an express or implied undertaking to hold it in confidence or is subject to a restriction on 

>>>>>in confidence if it is held subject to an express or implied undertaking to hold it in confidence or it is subject to a restriction 

<<<<<he holds it subject— ( a ) to an express or implied undertaking to hold it in confidence , or ( b ) to a 


\-------
no previous convictions could reasonably be expected to be sentenced to

>>>>>a person aged twenty-one or over with no previous convictions could reasonably be expected to be sentenced to three years’ imprisonment or more . 4.5 

<<<<<attained the age of twenty-one and has no previous convictions could reasonably be expected to be sentenced to imprisonment for a term of three years 


\-------
considers it necessary for the authorisation to continue to have effect for the purpose for which it was

>>>>>to have effect the Secretary of State considers it necessary for the authorisation to continue to have effect for the purpose for which it was given , the Secretary of State may 

<<<<<in whose absence it was given , considers it necessary for the authorisation to continue to have effect for the purpose for which it was issued , he may , in writing 

The first line in a block is the common phrase, the >>> elements are how the phrase appears in the first doc, the <<< elements are how it appears in the second doc. The width of the right and left margins of the contextual / concordance plot are parameterised and can be easily increased.

This seems such a basic problem – finding common phrases in different documents – I’d have expected there to be a standard solution to this? But in the quick search I tried, I couldn’t find one? It was quite a fun puzzle to play with though, and offers lots of scope for improvement (I suspect it’s a bit ropey when it comes to punctuation, for example). But it’s a start…:-)

There’s lots could be done on a UI front, too. For example, it’d be nice to be able to link documents, so you can click through from the first to show where the phrase came from in the second. But to do that requires annotating the original text, which in turn means being able to accurately identify where in a doc a token sequence appears. But building UIs is hard and time consuming… it’d be so much easier if folk could learn to use a code line UI!;-)

If you know of any “standard” solutions or packages for dealing with this sort of problem, please let me know via the comments:-)

PS The code could also do with some optimisation – eg if we know we’re repeatedly comparing against a base doc, it’s foolish to keep opening and tokenising the base doc…

n-gram / Multi-Word / Phrase Based Concordances in NLTK

A couple of days ago, my colleague Ray Corrigan shared with me a time consuming problem he was working on looking for original uses of sentences in previously published documents, drafts and bills that are contained in a currently consulting draft code of practice. On the one hand you might thing of this of an ‘original use’ detection problem, on the other, a “plagiarism detection” issue. Text analysis is a data problem I’ve not really engaged with, so I thought this provided an interesting problem set – how can I detect common sentences across two documents. It has the advantage of being easily stated, and easily tested…

I’ll describe in another post how I tried to solve the problem, but in this post will separate out one of the useful tricks I stumbled across along the way – how to display a multiple word concordance using the python text analysis package, NLTK.

To set the scene, the NLTK concordance method will find the bit of a document that a search word appears in and display the word along with the bit of text just before and after it:

SImilarities

Here’s my hacked together code – you’ll see the trick of the phrase start detection is actually a piece of code I found on Stack Overflow relating to the intersection of multiple python lists wrapped up in another found recipe describing how to emulate nltk.text.concordance() and return the found segments:

def n_concordance_tokenised(text,phrase,left_margin=5,right_margin=5):
    #concordance replication via https://simplypython.wordpress.com/2014/03/14/saving-output-of-nltk-text-concordance/
    phraseList=phrase.split(' ')

    c = nltk.ConcordanceIndex(text.tokens, key = lambda s: s.lower())
    
    #Find the offset for each token in the phrase
    offsets=[c.offsets(x) for x in phraseList]
    offsets_norm=[]
    #For each token in the phraselist, find the offsets and rebase them to the start of the phrase
    for i in range(len(phraseList)):
        offsets_norm.append([x-i for x in offsets[i]])
    #We have found the offset of a phrase if the rebased values intersect
    #--
    # http://stackoverflow.com/a/3852792/454773
    #the intersection method takes an arbitrary amount of arguments
    #result = set(d[0]).intersection(*d[1:])
    #--
    intersects=set(offsets_norm[0]).intersection(*offsets_norm[1:])
    
    concordance_txt = ([text.tokens[map(lambda x: x-left_margin if (x-left_margin)&gt;0 else 0,[offset])[0]:offset+len(phraseList)+right_margin]
                        for offset in intersects])
                         
    outputs=[''.join([x+' ' for x in con_sub]) for con_sub in concordance_txt]
    return outputs

def n_concordance(txt,phrase,left_margin=5,right_margin=5):
    tokens = nltk.word_tokenize(txt)
    text = nltk.Text(tokens)
 
    return n_concordance_tokenised(text,phrase,left_margin=left_margin,right_margin=right_margin)

If there are better ways of doing this, please let me know via a comment:-)

PS thinking about it, I should possibly tokenise rather than split the phrase? Then the tokens are the same as the tokens used in the matcher?

How to Run A Shiny App in the Cloud Using Tutum, Digital Ocean and Docker Containers

Via RBloggers, I spotted this post on Deploying Your Very Own Shiny Server. I’ve been toying with the idea of running some of my own Shiny apps, so that post provided a useful prompt, though way too involved for me;-)

So here’s what seems to me to be an easier, rather more pointy-clicky, wiring stuff together way using Docker containers (though it might not seem that much easier to you the first time through!). The recipe includes: github, Dockerhub, Tutum and Digital Ocean.

To being with, I created a minimal shiny app to allow the user to select a CSV file, upload it to the app and display it. The ui.R and server.R files, along with whatever else you need, should be placed into an application directory, for example shiny_demo within a project directory, which I’m confusingly also calling shiny_demo (I should have called it something else to make it a bit clearer – for example, shiny_demo_project.)

The shiny server comes from a prebuilt docker container on dockerhub – rocker/shiny.

This shiny server can run several Shiny applications, though I only want to run one: shiny_demo.

I’m going to put my application into it’s own container. This container will use the rocker/shiny container as a base, and simply copy my application folder into the shiny server folder from which applications are served. My Dockerfile is really simple and contains just two lines – it looks like this and goes into a file called Dockerfile in the project directory:

FROM rocker/shiny
ADD shiny_demo /srv/shiny-server/shiny_demo

The ADD command simply copies the the contents of the child directory into a similarly named directory in the container’s /srv/shiny-server/ directory. You could add as many applications you wanted to the server as long as each is in it’s own directory. For example, if I have several applications:

docker-containers

I can add the second application to my container using:

ADD shiny_other_demo /srv/shiny-server/shiny_other_demo

The next thing I need to do is check-in my shiny_demo project into Github. (I don’t have a how to on this, unfortunately…) In fact, I’ve checked my project in as part of another repository (docker-containers).

docker-containers_shiny_demo_at_master_·_psychemedia_docker-containers

The next step is to build a container image on DockerHub. If I create an account and log in to DockerHub, I can link my Github account to it.

I can then create an Automated Build that will build a container image from my Github repository. First, identify the repository on my linked Github account and name the image:

Docker_Hub_

Then add the path the project directory that contains the Dockerfile for the image you’re interested in:

Docker_Hub_git

Click on Trigger to build the image the first time. In the future, every time I update that folder in the repository, the container image will be rebuilt to include the updates.

So now I have a Docker container image on Dockerhub that contains the Shiny server from the rocker/shiny image and a copy of my shiny application files.

Now I need to go Tutum (also part of the Docker empire), which is an application for launching containers on a range of cloud services. If you link your Digital Ocean account to tutum, you can use tutum to launch docker containers on Dockerhub on a Digital Ocean droplet.

Within tutum, you’ll need to create a new node cluster on Digital Ocean:

(Notwithstanding the below, I generally go for a single 4GB node…)

Now we need to create a service from a container image:

I can find the container image I want to deploy on the cluster that I previously built on Dockerhub:

New_Service_Wizard___Tutum1

Select the image and then configure it – you may want to rename it, for example. One thing you definitely need to do though is tick to publish the port – this will make the shiny server port visible on the web.

New_Service_Wizard___Tutum3a

Create and deploy the service. When the container is built, and has started running, you’ll be told where you can find it.

Uploading_Files_and_Welcome_to_Shiny_Server__and_shiny-demo-0aed1c00-1___Tutum

Note that if you click on the link to the running container, the default URL starts with tcp:// which you’ll need to change to http://. The port will be dynamically allocated unless you specified a particular port mapping on the service creation page.

To view your shiny app, simply add the name of the folder the application is in to the URL.

When you’ve finished running the app, you may want to shut the container down – and more importantly perhaps, switch the Digital Ocean droplet off so you don’t continue paying for it!

Node_dashboard___Tutum

As I said at the start, the first time round seems quite complicated. After all, you need to:

(Actually, you can miss out the dockerhub steps, and instead link your github account to your tutum account and do the automated build from the github files within tutum: Tutum automatic image builds from GitHub repositories. The service can then be launched by finding the container image in your tutum repository)

However, once you do have your project files in github, you can then easily update them and easily launch them on Digital Ocean. In fact, you can make it even easier by adding a deploy to tutum button to a project README.md file in Github.

See also: How to run RStudio on Digital Ocean via Tutum and How to run OpenRefine on Digital Ocean via Tutum.

PS to test the container locally, I launch a docker terminal from Kitematic, cd into the project folder, and run something like:

docker build -t psychemedia/shinydemo .
docker run --name shinydemo -i -t psychemedia/shinydemo

I can then set the port map and find a link to the server from within Kitematic.