Tags Associated With Other Tags on Delicious Bookmarked Resources

If you’re using a particular tag to aggregate content around a particular course or event, what do the other tags used to bookmark those resource tell you about that course or event?

In a series of recent posts, I’ve started exploring again some of the structure inherent in socially bookmarked and tagged resource collections (Visualising Delicious Tag Communities Using Gephi, Social Networks on Delicious, Dominant Tags in My Delicious Network). In this post, I’m going to look at the tags that co-occur with a particular tag that may be used to bookmark resources relating to an event or course, for example.

Here are a few examples, starting with cck11, using the most recent bookmarks tagged with ‘cck11′:

The nodes are sized according to degree; the edges represent that the two tags were both applied by an individual user person to the same resource (so if three (N) tags were applied to a resource (A, B, C), there are N!/(K!(N-K)!) pairwise (K=2) combinations (AB, AC, BC; that is, three combinations in this case.).

Here are the tags for lak11 – can you tell what this online course is about from them?

Finally, here are tags for the OU course T151; again, can you tell what the course is most likely to be about?

Here’s the Python code I used to generate the gdf network definition files used to generate the diagrams shown above in Gephi:

import simplejson, urllib

def getDeliciousTagURL(tag,typ='json', num=100):
  #need to add a pager to get data when more than 1 page
  return "http://feeds.delicious.com/v2/json/tag/"+tag+"?count=100"

def getDeliciousTaggedURLTagCombos(tag):
  durl=getDeliciousTagURL(tag)
  data = simplejson.load(urllib.urlopen(durl))
  uniqTags=[]
  tagCombos=[]
  for i in data:
    tags=i['t']
    for t in tags:
      if t not in uniqTags:
        uniqTags.append(t)
    if len(tags)>1:
      for i,j in combinations(tags,2):
        print i,j
        tagCombos.append((i,j))
  f=openTimestampedFile('delicious-tagCombos',tag+'.gdf')
  header='nodedef> name VARCHAR,label VARCHAR, type VARCHAR'
  f.write(header+'\n')
  for t in uniqTags:
    f.write(t+','+t+',tag\n')
  f.write('edgedef> tag1 VARCHAR,tag2 VARCHAR\n')
  for i,j in tagCombos:
      f.write(i+','+j+'\n')
  f.close()

def combinations(iterable, r):
    # combinations('ABCD', 2) --> AB AC AD BC BD CD
    # combinations(range(4), 3) --> 012 013 023 123
    pool = tuple(iterable)
    n = len(pool)
    if r > n:
        return
    indices = range(r)
    yield tuple(pool[i] for i in indices)
    while True:
        for i in reversed(range(r)):
            if indices[i] != i + n - r:
                break
        else:
            return
        indices[i] += 1
        for j in range(i+1, r):
            indices[j] = indices[j-1] + 1
        yield tuple(pool[i] for i in indices)

Next up? I’m wondering whether a visualisation of the explicit fan/network (i.e. follower/friend) delicious network for users of a given tag might be interesting, to see how it compares to the ad hoc/informal networks that grow up around a tag?

5 comments

  1. Julià Minguillón

    dear Tony,

    great job! I have been doing research on how people uses delicious to manage/describe URLS and I repeated some of my experiments using the same search term “cck11+CCK11″.

    This is what I previously did in similar experiments:

    http://prezi.com/ho4yhnbbivh-/analyzing-hidden-semantics-in-social-bookmarking-of-open-educational-resources/

    With this search (cck11+CCK11) I obtained a data set with 119 different URLs tagged by 2467 different users using 2020 different tags in total. If we take only those tags that appear at least 5 times, this means that 247 different tags are relevant. This is the tag cloud generated for such tag set using manyeyes:

    http://www-958.ibm.com/software/data/cognos/manyeyes/visualizations/tag-cloud-for-cck11-resources-on-d

    Regarding the results, using a simple Principal Component Analysis (see the prezi presentation for more details) we obtain the following clusters (the first 10, showing the most important tags for each cluster):

    1) tools, mindmap, online, software, collaboration, socialmedia, resources
    2) audiobooks, free, culture, podcasts, audio, ebooks, courses, movies, books
    3) graph, data, datasets, visualization, gephi, network
    4) etc647, fall2009, lesson1
    5) del.icio.us, bookmarlet, bookmark, bookmarks, delicious
    6) mindmaps, collaborative, apps, 2010, mindmeister
    7) environment, ecology, sustainability, complexity, innovation, systems
    8) tutorials, writing, videos, literacy, english, multimedia
    9) education, learning, technology, theory, web2.0, e-learning, elearning, connectivism
    10) resilience, innovation, ecology, panarchy

    I’m not following CCK11, but I can recognize most clusters. Maybe somebody will be able to explain cluster 4). Let me ask you guys attending CCK11 a question: do these clusters represent somehow the topics/syllabus in the course?

    Well, just playing around with data, I’m following LAK11, BTW :-)

    best regards,

    Julià Minguillón
    Universitat Oberta de Catalunya
    Barcelona, Spain

  2. Pingback: #CCK11Discourse and the networks | Suifaijohnmak's Weblog
  3. Pingback: lak11 experiment – going into second week | escapingeggshells
  4. Pingback: Visualising tag data (from Zotero?) | Edu-browse
  5. Pingback: Visualising tag data (from Zotero?) | Finding Knowledge