Converting discussions in Debian BTS to RDF

This article present process collecting data about discussions in Debian bug tracking system(BTS). The discussions are converted to RDF data, stored and manipulated by a RDF store. It show also the useful of constructing communication network from these discussions by using SPARQL

Data source: maintainer mbox in Debian bugs

Each bug in Debian has an mbox file which contains discussions of maintainers. For example maintainer mbox of bug 404019

Semantic web standards: SIOC, FOAF

  • SIOC (Semantically Interlinked Online Communities) provides the main concepts and properties required to describe information from online communities (blogs, forums, mailing lists, wikis, etc.) on the Semantic Web.
  • FOAF (Friend of a friend) is a machine-readable ontology describing persons, their activities and their relations to other people and objects.

Tools

  • SWAML is a Python script that parses a mailbox, uses Web services (such as SWSE or Sindice) to locate FOAF profiles to enrich the dataset, and outputs the information in RDF.
  • Virtuoso is an innovative Universal Server platform that delivers an enterprise level Data Integration and Management solution for SQL, RDF, XML, Web Services, and Business Processes. In this case Virtuoso is used as a RDF Triple Store where we can store RDF files and query data by using SPARQL
  • BUXON is a sioc:Forum browser written by Python

Collecting data

  • For each package in Debian, concatenate all of mbox file of its bugs
  • Convert mbox file into RDF files by using SWAML
  • Store RDF file in Virtuoso
  • Using Buxon to navigate messages in RDF files
  • Using SPARQL to query data in Virtuoso

Buxon interface

Constructing communication network : Who replied to whom?

  • SPARQL example:
 prefix foaf: <http://xmlns.com/foaf/0.1/>
 prefix sioc: <http://rdfs.org/sioc/ns#>
 prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 prefix dc: <http://purl.org/dc/elements/1.1/>
 prefix dct: <http://purl.org/dc/terms/>

 SELECT distinct ?name1, ?name2 WHERE {
    ?post sioc:has_reply ?reply .
    ?post sioc:has_creator ?p1 .
    ?reply sioc:has_creator ?p2 .
    ?p1 sioc:name ?name1 .
    ?p2 sioc:name ?name2 .
 }
  • Network visualization (Sympa package)

Updated: The graph below is generated from discussions of Sympa package by using Social Network Analysis Tools in R

Common issues

  • Same person, multiple emails
Proposal: Integrate with another RDF database of Debian. For example Ultimate Debian Database (UDD) in RDF where we can mapping sioc:User to FOAF profile of maintainers in Carnivore.

Updated: Example of message in RDF

From message http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=404019#10 we have RDF file post-352.rdf and author of message

Other Posts in August 2009


Comments:

Cool !

Would be great to have an example of the output of swaml on the mbox you mentioned.

And also, what's the tool used to create the graph ?

Comment by Olivier Berger on August 05, 2009, at 10:31 AM EST

An another graph for Linux-2.6 package: Link

Comment by Quang Vu Dang on August 06, 2009, at 08:12 AM EST

Leave a comment

Subject: Name (required)
Email (will not be published) (required)
Website

Enter code: