recent comments

recent articles

  • The Avengers

    Almer S. Tigelaar 11 / 05 / 2012

    Marvel teased us with the release of this film near the end of various previously released super hero flicks like Captain America and Iron Man 2. This would be the movie that unites all the super heroes from the Marvel universe. Well actually, only those that had not been previously licensed to other studios. Hence, you will not find characters from X-Men, Spiderman, or the Fantastic Four in this movie. Director Joss Whedon brings back fond memories of creative television series like Firefly and Dollhouse, but what does he make of a 220 million blockbuster production?

    read more 0 comments
  • Hugo

    Almer S. Tigelaar 06 / 03 / 2012

    Hugo is based on a relatively recently released (2007) award winning book by Brian Selznick. It is not surprising that the film rights to the books were quickly sold, and certainly not by the least of directors either: Martin Scorsese. He has a career spanning decades and has directed a string of movies in recent years which I liked, among which are Shutter Island, The Departed and Gangs of New York. However, those were admittedly all in different, less family friendly, genres. So, I went to Hugo hoping to be pleasantly surprised.

    read more 0 comments
  • How long would it take to read Wikipedia?

    Almer S. Tigelaar 21 / 02 / 2012

    Wikipedia has become the de facto encyclopedia on the Internet. A traditional encyclopedia spans many textbook volumes which would take any normal person ages to read. Few people would likely engage in such an endeavor. However, since Wikipedia is readily accessible: should you take up the challenge?

    read more 0 comments

Almer S. Tigelaar » Journals

Automatic Summarisation of Discussion Fora

Almer S. Tigelaar 24 / 03 / 2010, 23:59

Automatic Summarisation of Discussion Fora
Tigelaar, A. S. & Akker, R. op den & Hiemstra, D.
Natural Language Engineering Volume 16, Issue 2, 2010, ISSN 1351-3249, (pp. 161-192).

View at Cambridge Journals On-Line
View in Repository

Abstract
Web-based discussion fora proliferate on the Internet. These fora consist of threads about specific matters. Existing forum search facilities provide an easy way for finding threads of interest. However, understanding the content of threads is not always trivial. This problem becomes more pressing as threads become longer. It frustrates users that are looking for specific information and also makes it more difficult to make valuable contributions to a discussion. We postulate that having a concise summary of a thread would greatly help forum users. But, how would we best create such summaries? In this paper, we present an automated method of summarising threads in discussion fora. Compared with summarisation of unstructured texts and spoken dialogues, the structural characteristics of threads give important advantages. We studied how to best exploit these characteristics. Messages in threads contain both explicit and implicit references to each other and are structured. Therefore, we term the threads hierarchical dialogues. Our proposed summarisation algorithm produces one summary of an hierarchical dialogue by ‘cherry-picking’ sentences out of the original messages that make up a thread. We try to select sentences usable for obtaining an overview of the discussion. Our method is built around a set of heuristics based on observations of real fora discussions. The data used for this research was in Dutch, but the developed method equally applies to other languages. We evaluated our approach using a prototype. Users judged our summariser as very useful, half of them indicating they would use it regularly or always when visiting fora.