At the Mozilla/Knight/Guardian/BBC NewsJam #mojo event on Saturday (review by Martin Belam; see also Martin’s related review of the news:rewired event the day before), I was part of a small team that looked at how we might start validating folk tweeting about a particular news event. Here’s a brief write up of our design…
Try it here: SocioGeo map
When exploring twitter based activity around an event, Guardian journalist Paul Lewis raised the question “how does a journalist know which sources are to be trusted?” (a source verification problem), identifying this as one area where tool innovation may be able to help the journalist assessing which twitter sources around an event may be worth following or contacting directly.
The SocioGeo map begins to address this concern, and represents an initial attempt at mapping the social and geographical spread of tweets around an event in near real time. In its first incarnation, SocioGeoMap is intended to support visual analysis of the social and spatial distribution of people tweeting about an event in order to identify the extent to which people tweeting about an event are co-located with it/and or each other (initially, based on a sampling of geocoded tweets, although this might extend to reconciliation of identities from Twitter into location based checkin services such as Foursquare, or derived location services such as uploaded geocoded photos to Flickr), and the extent to which they know each other (initially, this is limited to whether or not they are following each other on Twitter, but could be extended to other social networks).
In his presentation at the #mojo London event, Guardian interactive designer Alastair Dant suggested a fruitful approach for hacks/hackers communities might be to identify publication “archetypes” such as maps and timelines, as well as “standard content types” such as map+timeline combinations. To these, we might add the “social network” archetype, and geo-social maps (locating nodes in space and drawing social connections between them), socio-temporal maps (showing how social connections ebb and flow over time, or how messages are passed between actors) or geo-socio-temporal maps (where we plot how messages move across spatially and socially distributed nodes over time.
If the simple geo-social map depiction demonstrated above does turn out to be useful, informative or instructive, the next phase might be to start using mathematical analyses of the geographical concentration of people tweeting about an event, as well as social network analysis metrics to start assigning certainty factors to individuals relating to the degree of confidence we might have that they were eyewitness to an event, embedded within it/central to it or a secondary/amplifying source only, and so on. A wider social network analysis (eg of the social networks of people associated with an event) might also provide information related to the authority/trustedness/reputations of the source in other contexts. These certainty factors might then be able to rank tweets associated with an event, or identify sources who might be worth contacting directly, or ignoring altogether. (That is, the analyses might be able to contribute to automatic filter configuration).
SocioGeoMap is based on several observations:
- that events may occur in a physical location, or virtual online space, or a combination of the two;
- that people tweeting about an event may or may not be participating in it or eyewitnesses to it (if not, they may be amplifying for direct or indirect reasons (indirect reasons might be where the retweeter is not really interested in the event, but was interested in amplifying the content of a tweet that also mentioned the event); we might associate a certainty factor with the extent to we believe a person was a participant in, or eyewitness to an event, whether they were rebroadcasting the event as a ‘news service”, whether they were commenting on the event, or raising a question to event participants, and so on;
- that people tweeting about an event may or may not know each other.
Taking the example of football match, we might imagine several different co-existing states:
- a significant number of people co-located with the event (and eyewitnesses to it); small clusters of these people may be tightly interconnected and follow each other (for example, social groups who visit matches together), some clusters that are weakly associated with each other via a common node (for example, different follower groups of the same team following the same football players), large numbers of people/clusters that are independent).
- a very large number of people following the event through a video or audio stream but not co-located with it; it is likely that we will see large numbers of small subnetworks of people who know each other through eg work but who also share an interest in football;
In the case of a bomb going off in a busy public space, we might imagine:
- a small number of people colocated with the event and largely independent of each other (not socially connected to each other)
- a larger number of people who know eyewitnesses and retweet the event;
- people in the vicinity of the event but unaware of it, except that they have been inconvenienced by it in some way;
- people unconnected to the event who saw it via a news source or as a trending news topic and retweet it to feel as if they are doing their bit, to express shock, horror, pity, anger, etc
SocioGeoMap helps visualise the extent to which twitter activity around an event is either distributed or localised in both social/social network and geographical spaces.
In its current form, SocioGeoMap is built from a couple of pre-existing services:
- a service that searches the recent twitter stream around a topic and identifies social connections between people who are part of that stream;
- a service that searches the recent twitter stream around a particular location (using geo-coded tweets) and renders them on an embeddable map</li
In its envisioned next generation form, SocioGeoMap will display people tweeting about a particular topic by location (i.e. on a map) and also draw connections between them to demonstrate the extent to which they are socially connected (on Twitter, at least).
SocioGeoMap as currently presented is based on particular, user submitted search queries that may also have a geographical scope. An extention of SocioGeoMap might be to create SocioGeoMap alerting dashboards around particular event types, using techniques similar to the techniques employed in many sentiment analysis tools, specifically the filtering of items through word lists containing terms that are meaningful in terms of sentiment. The twist in news terms is to identify meaningful terms that potentially relate to newsworthy exclamations (“Just heard a loud explosion”, “goal!!!!”, “feel an earthquake?” and so on), and rather than associating positive or negative sentiment around a term brand, trying to discover tweets associated with sentiments of shock or concern in a particular geographical location.
SocioGeoMap may also be used in associsation with other services that support the pre-qualification or pre-verification of individuals, or certainty measure estimates on their expertise or likelihood of being in a particular place at a particular time. So for example, in the first case we might imagine doing some prequalification work around people likely to attend a planned event, such as a demonstration, based on their public declarations (“Off to #bigDemo tomorrow”), or identify their remote support/interest in it (“gutted not to be going to #bigDemo tomorrow”). Another example might include looking for geolocated evidence that an individual is a frequenter of a particular space, for example through a geo-coded analysis of their personal twitter stream and potentially also at one remove, such as through a geocoded analysis of their friends’ profiles and tweetstream, and as a result derive a certainty measure about the association of an individual with a particular location; that is, we could start to assign certainty measure to the likelihood of their being an eyewitness to an event in a particular locale based on previous geo-excursions.
By: Tony Hirst (@psychemedia), Alex Gamela (@alexgamela), Misae Richwood (@minxymoggy)
Mozilla/Knight/Guardian/BBC News Jam, Kings Tower, London, May 28th, 2011 #mojo
Implementation notes:
The demo was built out of a couple of pre-existing tools/components: a geo-based Twitter search constructed using Yahoo Pipes (Discovering Co-location Communities – Twitter Maps of Tweets Near Wherever…); and a map of social network connections between folk recently using a particular search term or hashtag (Using Protovis to Visualise Connections Between People Tweeting a Particular Term). It is possible to grab a KML URL from the geotwitter pipe and feed it into a Google map that can be embedded in a page using an iframe. The social connections graph can also be embedded in an iframe. The SocialGeoMap page is a page that contains two iframes, one that loads the map, and a second that loads the social network graph. The same data pulled from the Yahoo geo-search pipe feeds both visualisations.
In many cases, several tweets may have the exact same geo-coordinates, which means they are overlaid on the map and difficult to see. to get around this, a certain amount of jitter is added to each latitude and longitude; because Yahoo Pipes doesn’t have a native random number generator, I use a tweet ID to generate a jitter offset using the following pipe:
This is called just before the output of the geotwitter search pipe:
Whilst this does mean that no points are plotted with their exact original co-ordinates, it does mean that we can separate out most of the markers corresponding to tweets with the same latitude and longitude and thus see them independently on the map at their approximate location.
A next step in development might to move away from using Yahoo pipes, (which incur a cacheing overhead) and use server side service. A quickstart solution to this might be to generate a Python equivalent of the current pipework using Greg Gaughan’s pipe2py compiler, that generates a Python code equivalent of a Yahoo pipe.