Automatic Discussion Summarization: A Study of Internet Fora

Tigelaar, A. S. [Master’s Thesis], Supervised by Akker, R. op den.


The purpose of this research was finding automated methods to summarize discussions held on Internet fora. A second goal was building a functional prototype implementing these methods. This explorative study tries to find what technologies and methods can be usefully combined into an automatic discussion summarizer. The focus of this research is on two types of threads: Problem-Solution and Statement-Discussion. Although Dutch is the main language used, much of the presented work is also applicable to other languages. Compared to summarization of unstructured texts (and spoken dialogs) the structural characteristics of threads give important advantages. We studied how these characteristics of discussion threads can be exploited. Messages in threads contain explicit and implicit references to each other. They also have a relatively structured internal make-up. Therefore, we call the threads hierarchical dialogues. The algorithm produces one summary of an hierarchical dialogue by cherry-picking sentences out of the original messages that make up the thread. For sentence selection we try to find the main focus of the discussion that is usable to obtain an overview of the discussion. The system is build around a set of heuristics based on observations of real discussions. We developed a functioning prototype. The performance of this system was evaluated for Dutch only, but the system also supports English. Various aspects of parts of the system and the methods developed were evaluated. Much can be done to improve the current approach. Although the idea of building a summarization system in the way presented in this thesis is feasible.

