Category: Infoskills

Getting Started With Personal App Containers in the Cloud

…aka “how to run OpenRefine in the cloud in just a couple of clicks and for a few pennies an hour”…

I managed to get my first container up and running in the cloud today (yeah!:-), using tutum to launch a container I’d defined on Dockerhub and run it on a linked DigitalOcean server (or as they call them, “droplet”).

This sort of thing is probably a “so what?” to many devs, or even folk who do the self-hosting thing, where for example you can launch your own web applications using CPanel, setting up your own WordPress site, perhaps, or an online database.

The difference for me is that the instance of OpenRefine I got up and running in the cloud via a web browser was the result of composing several different, loosely coupled services together:

  • I’d already published a container on dockerhub that launches the latest release version of OpenRefine: psychemedia/docker-openrefine. This lets me run OpenRefine in a boot2docker virtual machine running on my own desktop and access it through a browser on the same computer.
  • Digital Ocean is a cloud hosting service with simple billing (I looked at Amazon AWS but it was just too complicated) that lets you launch cloud hosted virtual machines of a variety of sizes and in a variety of territories (including the UK). Billing is per hour with a monthly cap with different rates for different machine specs. To get started, you need to register an account and make a small ($5 or so) downpayment using Paypal or a credit card. So that’s all I did there – created an account and made a small payment. [Affiliate Link: sign up to Digital Ocean and get $10 credit]
  • tutum an intermediary service that makes it easy to launch servers and containers running inside them. By linking a DigitalOcean account to tutum, I can launch containers on DigitalOcean in a relatively straightforward way…

Launching OpenRefine via tutum

I’m going to start by launching a 2GB machine which comes in a 3 cents an hour, capped at $20 a month.

tutum_0a

tutum_0b

Now we need to get a container – which I’m thinking of as if it was a personal app, or personal app server:

tutum1

I’m going to make use of a public container image – here’s one I prepared earlier…

tutum2

We need to do a tiny bit of configuration. Specifically, all I need to do is ensure that I make the port public so I can connect to it; by default, it will be assigned to a random port in a particular range on the publicly viewable service. I can also set the service name, but for now I’ll leave the default.

tutum3

If I create and deploy the container, the image will be pulled from dockerhub and a container launched based on it that I should be able to access via a public URL:

tutum4

The first time I pull the container into a specific machine it takes a little time to set up as the container files are imported into the machine. If I create another container using the same image (another OpenRefine instance, for example), it should start really quickly because all the required files have already been loaded into the node.

tutum5

Unfortunately, when I go through to the corresponding URL, there’s nothing there. Looking at the logs, I think maybe there wasn’t enough memory to launch a second OpenRefine container… (I could test this by launching a second droplet/server with more memory, and then deploying a couple of containers to that one.)

tutum6

The billing is calculated on DigitalOcean on a hourly rate, based on the number and size of servers running. To stop racking up charges, you can terminate the server/droplet (so you also lose the containers).

tutrm7

Note than in the case of OpenRefine, we could allow several users all to access the same OpenRefine container (the same URL) and just run different projects within them.

So What?

Although this is probably not the way that dev ops folk think of containers, I’m seeing them as a great way of packaging service based applications that I might one to run at a personal level, or perhaps in a teaching/training context, maybe on a self-service basis, maybe on a teacher self-service basis (fire up one application server that everyone in a cohort can log on to, or one container/application server for each of them; I noticed that I could automatically launch as many containers as I wanted – a 64GB 20 core processor costs about $1 per hour on Digital Ocean, so for an all day School of Data training session, for example, with 15-20 participants, that would be about $10, with everyone in such a class of 20 having their own OpenRefine container/server, all started with the same single click? Alternatively, we could fire up separate droplet servers, one per participant, each running its own set of containers? That might be harder to initialise though (i.e. more than one or two clicks?!) Or maybe not?)

One thing I haven’t explored yet is mounting data containers/volumes to link to application containers. This makes sense in a data teaching context because it cuts down on bandwidth. If folk are going to work on the same 1GB file, it makes sense to just load it in to the virtual machine once, then let all the containers synch from that local copy, rather than each container having to download its own copy of the file.

The advantage of the approach described in the walkthrough above over “pre-configured” self-hosting solutions is the extensibility of the range of applications available to me. If I can find – or create – a Dockerfile that will configure a container to run a particular application, I can test it on my local machine (using boot2docker, for example) and then deploy a public version in the cloud, at an affordable rate, in just a couple of steps.

Whilst templated configurations using things like fig or panamax which would support the 1-click launch of multiple linked containers configurations aren’t supported by tutum yet, I believe they are in the timeline… So I look forward to trying out a click cloud version of Using Docker to Build Linked Container Course VMs when that comes onstream:-)

In an institutional setting, I can easily imagine a local docker registry that hosts images for apps that are “approved” within the institution, or perhaps tagged as relevant to particular courses. I don’t know if it’s similarly possible to run your own panamax configuration registry, as opposed to pushing a public panamax template for example, but I could imagine that being useful institutionally too? For example, I could put container images on a dockerhub style OU teaching hub or OU research hub, and container or toolchain configurations that pull from those on a panamax style course template register, or research team/project reregister? To front this, something like tutum, though with an even easier interface to allow me to fire up machines and tear them down?

Just by the by, I think part of the capital funding the OU got recently from HEFCE was slated for a teaching related institutional “cloud”, so if that’s the case, it would be great to have a play around trying to set up a simple self-service personal app runner thing ?;-) That said, I think the pitch in that bid probably had the forthcoming TM352 Web, Mobile and Cloud course in mind (2016? 2017??), though from what I can tell I’m about as persona non grata as possible with respect to even being allowed to talk to anyone about that course!;-)

Whose Browser (or Phone, or Drone?!) Is It Anyway?

I’m not sure how many Chrome users follow any of the Google blogs that occasionally describe forthcoming updates to Google warez, but if you don’t you perhaps don’t realise quite how frequently things change. My browser, for example, is at something like version 40, even though I never consciously update it.

One thing I only noticed recently that a tab appeared in the top right hand of the browser showing that I’m logged in (to the browser) with a particular Google account. There doesn’t actually appear to be an option to log out – I can switch user or go incognito – and I’m not sure I remember even consciously logging in to it (actually, maybe a hazy memory, when I wanted to install a particular extension) and I have no idea what it actually means for me to be logged in?

Via the Google Apps Update blog, I learned today that being logged in to the browser will soon support is seemless synching of my Google docs into my Chrome browser environment (Offline access to Google Docs editors auto-enabled when signing into Chrome browser on the web). Following a pattern popularised by Apple, Google are innovating on our behalf and automatically opting us in to behaviours it thinks make sense for us. So just bear that in mind when you write a ranty resignation letter in Google docs and wonder why it’s synched to your work computer on your office desk:

Note that Google Apps users should not sign into a Chrome browser on public/non-work computers with their Google Apps accounts to avoid unintended file syncing.

If you actually have several Google apps accounts (for example, I have a personal one, and a couple of organisational ones: an OU one, an OKF one), I assume that the only docs that are synched are the ones on an account that matches the account I have signed in to in the browser. That said, synch permissions may be managed centrally for organisational apps accounts:

Google Apps admins can still centrally enable or disable offline access for their domain in the Admin console .. . Existing settings for domain-level offline access will not be altered by this launch.

I can’t help but admit that even though I won’t have consciously opted in to this feature, just like I don’t really remember logging in to Chrome on my desktop (how do I log out???) and I presumably agreed to something when I installed Chrome to let it keep updating itself without prompting me, I will undoubtedly find it useful one day: on a train, perhaps, when trying to update a document I’d forgotten to synch. It will be so convenient I will find it unremarkable, not noticing I can now do something I couldn’t do as easily before. Or I might notice, with a “darn, I wish I’d..” then “oh, cool, [kewel…] I can…).

“‘Oceania has always been at war with Eastasia.'” [George Orwell, 1984]

Just like when – after being sure I’d disabled or explicitly no; opted in to any sort of geo-locating or geo-tracking behaviour on my Android phone, I found I must have left a door open somewhere (or been automatically opted in to something I hadn’t appreciated when agreeing to a particular update (or by proxy, agreeing to allow something to update itself automatically and without prompting and with implied or explicit permission to automatically opt me in to new features….) and found I could locate my misplaced phone using the Android Device Manager (Where’s My Phone?).

This idea of allowing applications to update themselves in the background and without prompting is something we have become familiar with in many web apps, and in desktop apps such as Google Chrome, though many apps do still require the user to either accept the update or take an even more positive action to install an update when notified that one is available. (It seems that ever fewer apps require you to specifically search for updates…)

In the software world, we have gone from a world where the things we buy we immutable, to one where we could search for and install updates (eg to operating systems of software applications), then accept to install updates when alerted to the fact, to automatically (and invisibly) accepting updates.

In turn, many physical devices have gone from being purely mechanical affairs, to electro-mechanical ones, to logical-electro-mechanical devices (for example, that include logic elements hardwired into silicon), to ones containing factory programmable hardware devices (PROMs, programmable Read Only Memories), to devices that run programmable and then re</programmable firmware (that is to say, software).

If you have a games console, a Roku or MyTV box, or Smart TV, you’ve probably already been prompted to get a (free) online update. I don’t know, but could imagine, new top end cars having engine management system updates at regular service events.

However, one thing perhaps we don’t fully appreciate is that these updates can also be used to limit functionality that our devices previously had. If the updates are done seemlessly (without permission, in the background) this may come as something as a surprise. [Cf. the complementary issue of vendors having access to “their” content on “your” machine, as described here by the Guardian: Amazon wipes customer’s Kindle and deletes account with no explanation]

A good example of loss of functionality arising by an (enforced, though self-applied) firmware update was reported recently in the context of hobbiest drones:

On Wednesday, SZ DJI Technology, the Chinese company responsible for the popular DJI Phantom drones that online retailers sell for less than $500, announced that it had prepared a downloadable firmware update for next week that will prevent drones from taking off in restricted zones and prevent flight into those zones.

Michael Perry, a spokesman for DJI, told the Guardian that GPS locating made such an update possible: “We have been restricting flight near airports for almost a year.”

“The compass can tell when it is near a no-fly zone,” Perry said. “If, for some reason, a pilot is able to fly into a restricted zone and then the GPS senses it’s in a no-fly zone, the system will automatically land itself.”

DJI’s new Phantom drones will ship with the update installed, and owners of older devices will have to download it in order to receive future updates.

Makers of White House drone offer fix to bar Phantom menace from no-fly zones

What correlates might be applied to increasingly intelligent cars, I wonder?! Or at the other extreme, phones..?

PS How to log out of Chrome You need to administer yourself… From the Chrome Preferences Settings (sic), Disconnect your Google account.

Note that you have to take additional action to make sure that you remove all those synched presentations you’d prepared for job interviews at other companies from the actual computer…

Take care out there…!;-)

Split View Web Page Bookmarklet

What feels like forever ago, I described a method Split Screen Screenshots that allow you to put a copy of the same web page into two frames, so that you can grab a screenshot that includes the page header in the top frame, for example, as well as something from waaaaaay down the page in the lower frame.

I didn’t realise that the recipe I described required help from an external service until I tried to reuse the method to grab the following screenshot…

brirminghampayment

Anyway, cribbing from this Chrome Dual View standalone bookmark, here’s an updates version of my split screen bookmarklet:

javascript:url=location.href;f='<frameset rows=\'30%,*\'>\n<frame src=\''+url+'\'/><frame src=\''+url+'\'/></frameset>';with(document){write(f);void(close())}

To make use of it, drag the following link onto you bookmark toolbar, or save the link as a bookmark: Hmm… seems the craptastic WordPress.com doesn’t let me post bookmarklet links? So you’ll have to: bookmark this page, edit the bookmark, paste the above code snippet into the URL. Then when you want to split the view of a webpage, just click the bookmarklet.

Data Reporting, not Data Journalism?

As a technology optimist, I don’t tend to go in so much for writing up long critical pieces about tech (i.e. I don’t do the “proper” academic thing), instead preferring to spend time trying to work out how make use of it, either on its own or in tandem with other technologies. (My criticality is often limited to quickly ruling out things I don’t want to waste my time on because they lack interestingness, or in spending time how to appropriate one thing to do something it perhaps wasn’t intended to do originally.)

I also fall in to the camp of lacking in confidence about things other people think I know about, generally assuming there are probably “proper” ways of doing things for someone properly knowledgeable in a tradition, although I don’t know what they are. (Quite where this situates me on a scale of incompetence, I’m not sure… e.g. Why the Unskilled Are Unaware: Further Explorations of (Absent) Self-Insight Among the Incompetent h/t @Downes).

A couple of days ago, whilst doodling with a data-to-text notebook (the latest incarnation of which can be found here) I idly asked @fantasticlife whether it was an example of data journalism (we’ve been swapping links about bot-generated news reports for some time), placing it also in the context of The Changing Task Composition of the US Labor Market: An Update of Autor, Levy, and Murnane (2003), David H. Autor & Brendan Price,June 21, 2013, and in particular this chart about the decline in work related “routine cognitive tasks”:

economics_mit_edu_files_9758

His response? “No, that’s data reporting”.

So now I’m wondering: how do data reporting and data journalism differ? And to what extent is writing some code along the lines of:

def pc(amount,rounding=''):
    if rounding=='down': rounding=decimal.ROUND_DOWN
    elif rounding=='up': rounding=decimal.ROUND_UP
    else: rounding=decimal.ROUND_HALF_UP

    ramount=float(decimal.Decimal(100 * amount).quantize(decimal.Decimal('.1'), rounding=rounding))
    return '{0:.1f}%'.format(ramount)

def otwMoreLess(now,then):
    delta=now-then
    if delta&gt;0:
        txt=random.choice(['more'])
    elif delta&lt;0:
        txt=random.choice(['less'])
    return txt

def otwPCmoreLess(this,that):
    delta=this-that
    return '{delta} {diff}'.format(delta=pc(abs(delta)),diff=otwMoreLess(this,that))

otw3='''
That means {localrate} of the resident {localarea} population {poptype} are {claim} \
– {regiondiff} than the rest of the {region} ({regionrate}), \
and {ukdiff} than the whole of the UK ({ukrate}).
'''.format(localrate=pc(jsaLocal_rate),
           localarea=jsaLocal['GEOGRAPHY_NAME'].iloc[0],
           poptype=decase(get16_64Population(localcode)['CELL_NAME'].iloc[0].split('(')[1].split(' -')[0]),
           claim=decase(jsaLocal['MEASURES_NAME'].iloc[0]),
           regiondiff=otwPCmoreLess(jsaLocal_rate,jsaRegion_rate),
           region=jsaRegion['GEOGRAPHY_NAME'].iloc[0],
           regionrate=pc(JSA_rate(regionCode)),
           ukdiff=otwPCmoreLess(jsaLocal_rate,jsaUK_rate),
           ukrate=pc(jsaUK_rate))

print(otw3)

to produce text from nomis data of the form:

That means 1.9% of the resident Isle of Wight population aged 16-64 are persons claiming JSA – 0.5% more than the rest of the South East (1.4%), and 0.5% less than the whole of the UK (2.4%).

an output that is data reporting, an example of journalistic practice? For example, is locating the data source a journalistic act? Is writing a script to parse the data into words through the medium of code a journalistic act?

Does it become “more journalistic” if the text generating procedure comments on the magnitude of the changes? For example, using something like:

def _txt6(filler=','):
    
    def _magnitude(term):

        #A heuristic here
        if propDelta&lt;0.05:
            mod=random.choice(['slight'])
        elif propDelta&lt;0.10:
             mod=random.choice(['significant'])
        else:
             mod=random.choice(['considerable','large' ])
        term=' '.join([mod,term])
        return p.a(term)
            
    txt6=txt2[:-1]
    
    propDelta= abs(yeardelta)/mostRecent['Total']
    
    if yeardelta==0:
        txt6+=', exactly the same amount.'
    else:
        if yeardelta &lt;0: direction=_magnitude(random.choice([ 'decrease','fall']))
        else: direction=_magnitude(random.choice(['increase','rise']))
        
        txt6+='{_filler} {0} of {1} since then.'.format(p.a(direction),
                                                         abs(yeardelta),
                                                         _filler= filler )
    return txt6

print(txt1)
print(_txt6())

to produce further qualified text (in terms of commenting on the magnitude of the amount) that can take a variety of slightly different forms?

– The most recent figure (August 2014) for persons claiming JSA for the Isle of Wight area is 1502.
– This compares with a figure of 2720 from a year ago (August 2013), a large fall of 1218 since then.
– This compares with a figure of 2720 from a year ago (August 2013), a considerable fall of 1218 since then

At what point does the interpretation we can bake into the text generator become “journalism”, if at all? Can the algorithm “do journalism”? Is the crafting of the algorithm “journalism”? Or it is just computer assisted reporting?

In the context of routine cognitive tasks – such as the task of reporting the latest JSA figures in this town or that city – is this sort of automation likely? Is it replacing one routine cognitive task with another, or is it replacing it with a “non-routine analytical task”? Might it a be a Good Thing, allowing the ‘reporting’ to be done automatically, at least in draft form, and freeing up journalists to do other tasks? Or a Bad Thing, replace a routine, semi-skilled task by automation? To what extent might the algorithms so embed more intelligence and criticism, for example, automatically flagging up figures that are “interesting” in some way? (Would that be algorithmic journalism? Or would it be more akin to an algorithmic stringer?)

PS Twitter discussion thread around this post from 29/9/14

Assessing Data Wrangling Skills – Conversations With Data

By chance, a couple of days ago I stumbled across a spreadsheet summarising awarded PFI contracts as of 2013 (private finance initiative projects, 2013 summary data).

The spreadsheet has 800 or so rows, and a multitude of columns, some of which are essentially grouped (although multi-level, hierarchical headings are not explicitly used) – columns relating to (estimated) spend in different tax years, for example, or the equity partners involved in a particular project.

As part of the OU course we’re currently developing on “data”, we’re exploring an assessment framework based on students running small data investigations, documented using IPython notebooks, in which we will expect students to demonstrate a range of technical and analytical skills, as well as some understanding of data structures and shapes, data management technology and policy choices, and so on. In support of this, I’ve been trying to put together one or two notebooks a week over the course of a few hours to see what sorts of thing might be possible, tractable and appropriate for inclusion in such a data investigation report within the time constraints allowed.

To this end, the Quick Look at UK PFI Contracts Data notebook demonstrates some of the ways in which we might learn something about the data contained within the PFI summary data spreadsheet. The first thing to note is that it’s not really a data investigation: there is no specific question I set out to ask, and no specific theme I set out to explore. It’s more exploratory (that is, rambling!) than that. It has more of the form what I’ve started referring to as a conversation with data.

In a conversation with data, we develop an understanding of what the data may have to say, along with a set of tools (that is, recipes, or even functions) that allow us to talk to it more easily. These tools and recipes are reusable within the context of the data set or datasets selected for the conversation, and may be transferrable to other data conversations. For example, one recipe might be how to filter and summarise a dataset in a particular way, or generate a particular view or reshaping of it that makes it easy to ask a particular sort of question, or generate a particular sort of a chart (creating a chart is like asking a question where the surface answer is returned in a graphical form).

If we are going to use IPython notebook documented conversations with data as part of an assessment process, we need to tighten up a little more how we might expect to see them structured and what we might expect to see contained within them.

Quickly skimming through the PFI conversation, elements of the following skills are demonstrated:

– downloading a dataset from a remote location;
– loading it into an appropriate data representation;
– cleaning (and type casting) the data;
– spotting possibly erroneous data;
– subsetting/filtering the data by row and column, including the use of partial string matching;
– generating multiple views over the data;
– reshaping the data;
– using grouping as the basis for the generation of summary data;
– sorting the data;
– joining different data views;
– generating graphical charts from derived data using “native” matplotlib charts and a separate charting library (ggplot);
– generating text based interpretations of the data (“data2text” or “data textualisation”).

The notebook also demonstrates some elements of reflection about how the data might be used. For example, it might be used in association with other datasets to help people keep track of the performance of facilities funded under PFI contracts.

Note: the notebook I link to above does not include any database management system elements. As such, it represents elements that might be appropriate for inclusion in a report from the first third of our data course – which covers basic principles of data wrangling using pandas as the dominant toolkit. Conversations held later in the course should also demonstrate how to get data into an out of an appropriately chosen database management system, for example.

AP Business Wire Service Takes on Algowriters

Via @simonperry, news that AP will use robots to write some business stories (Automated Insights are one of several companies I’ve been tracking over the years who are involved in such activities, eg Notes on Narrative Science and Automated Insights).

The claim is that using algorithms to do the procedural writing opens up time for the journalists to do more of the sensemaking. One way I see this is that we can use data2text techniques to produce human readable press releases of things like statistical releases, which has a couple of advantages at least.

Firstly, the grunt – and error prone – work of running the numbers (calculating month on month or year on year changes, handling seasonal adjustments etc) can be handled by machines using transparent and reproducible algorithms. Secondly, churning numbers into simple words (“x went up month on month from Sept 2013 to Oct 2013 and down year on year from 2012″) makes them searchable using words, rather than having to write our own database or spreadsheet queries with lots of inequalities in them.

In this respect, something that’s been on my to do list for way to long is to produce some simple “press release” generators based on ONS releases (something I touched on in Data Textualisation – Making Human Readable Sense of Data).

Matt Waite’s upcoming course on “automated story bots” looks like it might produce some handy resources in this regard (code repo). In the meantime, he already shared the code described in How to write 261 leads in a fraction of a second here: ucr-story-bot.

For the longer term, on my “to ponder” list is what might something like “The Grammar of Graphics” be for data textualisation? (For background, see A Simple Introduction to the Graphing Philosophy of ggplot2.)

For example, what might a ggplot2 inspired gtplot library look like for converting data tables not into chart elements, but textual elements? Does it even make sense to try to construct such a grammar? What would the corollaries to aesthetics, geoms and scales be?

I think I perhaps need to mock-up some examples to see if anything comes to mind and that the function names, as well as the outputs, might look like, let alone the code to implement them! Or maybe code first is the way, to get a feel for how to build up the grammar from sensible looking implementation elements? Or more likely, perhaps a bit of iteration may be required?!

Writing Diagrams – Boxes and Arrows

If you’ve ever had to draw “blocks and arrows” diagrams, you’ll know how irritating it can be if you spend hours laying out the diagram using a presentation editor or drawing tool, only to find you need to edit the drawing, add another box, and lay the whole thing out again.

Surely there must be a better way?

Let’s just think about what a box and arrow diagram is intended to show: when describing a process, the connections typically represent a flow from one thing to another; furthermore, the layout is often rectilinear, laid out along straight lines, the boxes tidily spaced and their edges lined up with each other. In a diagram such as a mindmap, different ideas or concepts are related to each other by drawing a line between them and the layout may be more fluid, with like or related concepts grouped together in space, or by the additional use of colour themes, for example.

The primary information contained in the diagram are the text elements and the connections between them. The positioning on the page often reflects the structure of these connections. When we lay out a diagram, we unconsciously favour layouts that minimise the number of crossed lines (to keep the diagram “clean” looking), and group connected items close together (unless some other information requires us to separate them – for example, we might be using a timeline basis for a horizontal x-axis and placing boxes in areas of the canvas we are working on that are associated with a particular month).

The online Google Drawing document type is typical of drawing tools included in many office applications. As well as being able to draw boxes and connect registration points on each box by lines or arrows, a range of layout tools provides support for aligning and spacing boxes.

google draw

Tools such as popplet provide a friendlier environment for generating similar sorts of diagram:

popplet

Whilst drawing tools such as these allow you to craft your diagram by hand, building it up as you go along, actually putting previously collected information into blocks on the canvas, let alone connecting the blocks together and laying them out nicely, may be quite an involved and error prone affair.

In these circumstances, it may make more sense to take a raw representation of the block contents and a simple representation of connections between appropriate blocks and just write the relationships down, letting a drawing tool do the hard work of drawing the blocks, connecting them together and laying them out, at least in draft layout form. To provide for a final layer of customisation, it might also be useful to be able to take a vector/SVG representation of the automatically sketched layout into a drawing package where it can be tidied up by hand and the application of a human designer’s eye.

There are several online tools available that you can use to sketch box and arrow diagrams from simple text descriptions.

Text2mindmap

Text2Mindmap allows you to construct tree based mindmaps from a simple outline style description of the mindmap.

text2mindmap

The layout has a radial basis. Designs can be saved and images downloaded as JPG or PDF files.

Diagrammr

Diagrammr allows you to draw simple graph based network structures in which text labelled block elements can be connected to other blocks by labelled edges.

diagrammr

Designs are given a persistent URL, but anyone with access to the URL can edit the diagram.

JS Sequence Diagrams

JS Sequence Diagrams is a javascript library for generating sequence diagrams from a textual description of them.

jssequencediagrams

Diagrams can be saved as SVG files. The source code is available and depends on Jison and Raphaël as the graphics library.

blockdiag

blockdiag (application) uses a language similar to the DOT language for describing a range of block diagram types (block diagrams, sequence diagrams, activity diagrams, logical network diagrams).

blockdiag2

Diagrams can be saved as SVG diagrams and associated with a URL that contains all the information used to recreate the diagram. As such, large diagrams are not supported if they make the URL too long. Source code is available.

Several diagram types are available using blockdiag, including graphviz diagrams constructed using the DOT language.

GraphvizFiddle

GraphvizFiddle is a fiddler style application that lets you enter ,a href=”http://www.graphviz.org/content/dot-language”>Graphviz DOT language descriptions and preview the result.

graphvizfiddle

Files can be generated in SVG format, or a textual definitions (for example, in the dot layout language).

Summary

Generating boxes and arrows style diagrams can be a pain at times because the semantics of the diagram – how one item is related to another – is represented in a graphical rather than data based form. By writing down the relations and then automatically generating visual representations of them, we retain access to the data representation whilst letting the hard work of generating the initial draft, at least, of the layout to a machine.

Several tools are available to support the creation of such literally described box and arrow diagrams using a variety of description languages and generating a range of output image formats (SVG probably being the most useful if you need to edit the sketch diagram for yourself to tweak the layout for its final presentation). Code for some of the tools (JS Sequence diagrams, blockdiag) is available.

Arguably the most powerful tools allow you to “write” diagrams using the Graphviz DOT layout language. Whilst there is a certain overhead associated with learning this language, it does save time in the long run if you regularly need to create network style diagrams. Graphviz also supports a range of layout algorithms – see the Graphviz gallery for examples.

PS If you want to write your own diagramming application, the JointJS library looks like a handy library to have on hand… The Venn.js library also looks quite pretty – if you have to generate Venn diagrams, that is!