recent comments

recent articles

  • How long would it take to read Wikipedia?

    Almer S. Tigelaar 21 / 02 / 2012

    Wikipedia has become the de facto encyclopedia on the Internet. A traditional encyclopedia spans many textbook volumes which would take any normal person ages to read. Few people would likely engage in such an endeavor. However, since Wikipedia is readily accessible: should you take up the challenge?

    read more 0 comments
  • Life in a Day

    Almer S. Tigelaar 09 / 02 / 2012

    The premise behind the YouTube documentary “Life in a Day” is interesting: invite everyone around the world to shoot video on one specific day: July 24th 2010. Have people upload their raw footage and edit it so it becomes a short, ninety minute, documentary that chronicles a single day on our planet. Does this extreme form of crowdsourcing actually work?

    read more 0 comments
  • Top 8 Prejudices about Americans

    Almer S. Tigelaar 07 / 02 / 2012

    When travelling abroad it is difficult to go with an open mind. Despite our best efforts we bring with us an excess of prejudice shaped by our own culture and view of the destination country. So to it was for me when I visited the United States. When coming back, people at home are very insistent that you play into their prejudice regarding where you’ve been as well, perhaps as a means of reinforcing their own identity.

    read more 0 comments

Category: Other Works

Query-Based Sampling: Can we do Better than Random?

Almer S. Tigelaar 23 / 02 / 2010, 17:00

Query-Based Sampling: Can we do Better than Random?
Tigelaar, A. S. & Hiemstra, D.
Technical Report TR-CTIT-10-04 (2010), Centre for Telematics and Information Technology, University of Twente, Enschede, The Netherlands, ISSN 1381-3625.

View in Repository

Abstract
Many servers on the web offer content that is only accessible via a search interface. These are part of the deep web. Using conventional crawling to index the content of these remote servers is impossible without some form of cooperation. Query-based sampling provides an alternative to crawling requiring no cooperation beyond a basic search interface. In this approach, conventionally, random queries are sent to a server to obtain a sample of documents of the underlying collection. The sample represents the entire server content. This representation is called a resource description. In this research we explore if better resource descriptions can be obtained by using alternative query construction strategies. The results indicate that randomly choosing queries from the vocabulary of sampled documents is indeed a good strategy. However, we show that, when sampling a large collection, using the least frequent terms in the sample yields a better resource description than using randomly chosen terms.

read more 0 comments

Query-Based Sampling using Only Snippets

Almer S. Tigelaar 02 / 12 / 2009, 17:00

Query-Based Sampling using Only Snippets
Tigelaar, A. S. & Hiemstra, D.
Technical Report TR-CTIT-09-42 (2009), Centre for Telematics and Information Technology, University of Twente, Enschede, The Netherlands, ISSN 1381-3625.

View in Repository

Abstract
Query-based sampling is a popular approach to model the content of an uncooperative server. It works by sending queries to the server and downloading the returned documents in the search results in full. This sample of documents then represents the server’s content. We present an approach that uses the document snippets as samples instead of downloading entire documents. This yields more stable results at the same amount of bandwidth usage as the full document approach. Additionally, we show that using snippets does not necessarily incur more latency, but can actually save time.

read more 0 comments

Automatic Discussion Summarization: A Study of Internet Fora

Almer S. Tigelaar 11 / 07 / 2008, 14:00

Automatic Discussion Summarization: A Study of Internet Fora
Tigelaar, A. S. [Master's Thesis], Supervised by Akker, R. op den.

View in Repository

Abstract
The purpose of this research was finding automated methods to summarize discussions held on Internet fora. A second goal was building a functional prototype implementing these methods. This explorative study tries to find what technologies and methods can be usefully combined into an automatic discussion summarizer. The focus of this research is on two types of threads: Problem-Solution and Statement-Discussion. Although Dutch is the main language used, much of the presented work is also applicable to other languages. Compared to summarization of unstructured texts (and spoken dialogs) the structural characteristics of threads give important advantages. We studied how these characteristics of discussion threads can be exploited. Messages in threads contain explicit and implicit references to eachother. They also have a relatively structured internal make-up. Therefore, we call the threads hierarchical dialogues. The algorithm produces one summary of an hierarchical dialogue by cherry-picking sentences out of the original messages that make up the thread. For sentence selection we try to find the main focus of the discussion that is useable to obtain an overview of the discussion. The system is build around a set of heuristics based on observations of real discussions. We developed a functioning prototype. The performance of this system was evaluated for Dutch only, but the system also supports English. Various aspects of parts of the system and the methods developed were evaluated. Much can be done to improve the current approach. Although the idea of building a summarization system in the way presented in this thesis is feasible.

Posted using Mobypicture.com

read more 0 comments