recent comments

recent articles

  • How long would it take to read Wikipedia?

    Almer S. Tigelaar 21 / 02 / 2012

    Wikipedia has become the de facto encyclopedia on the Internet. A traditional encyclopedia spans many textbook volumes which would take any normal person ages to read. Few people would likely engage in such an endeavor. However, since Wikipedia is readily accessible: should you take up the challenge?

    read more 0 comments
  • Life in a Day

    Almer S. Tigelaar 09 / 02 / 2012

    The premise behind the YouTube documentary “Life in a Day” is interesting: invite everyone around the world to shoot video on one specific day: July 24th 2010. Have people upload their raw footage and edit it so it becomes a short, ninety minute, documentary that chronicles a single day on our planet. Does this extreme form of crowdsourcing actually work?

    read more 0 comments
  • Top 8 Prejudices about Americans

    Almer S. Tigelaar 07 / 02 / 2012

    When travelling abroad it is difficult to go with an open mind. Despite our best efforts we bring with us an excess of prejudice shaped by our own culture and view of the destination country. So to it was for me when I visited the United States. When coming back, people at home are very insistent that you play into their prejudice regarding where you’ve been as well, perhaps as a means of reinforcing their own identity.

    read more 0 comments

Monthly Archives: December 2009

Query-Based Sampling using Only Snippets

Almer S. Tigelaar 02 / 12 / 2009, 17:00

Query-Based Sampling using Only Snippets
Tigelaar, A. S. & Hiemstra, D.
Technical Report TR-CTIT-09-42 (2009), Centre for Telematics and Information Technology, University of Twente, Enschede, The Netherlands, ISSN 1381-3625.

View in Repository

Abstract
Query-based sampling is a popular approach to model the content of an uncooperative server. It works by sending queries to the server and downloading the returned documents in the search results in full. This sample of documents then represents the server’s content. We present an approach that uses the document snippets as samples instead of downloading entire documents. This yields more stable results at the same amount of bandwidth usage as the full document approach. Additionally, we show that using snippets does not necessarily incur more latency, but can actually save time.

read more 0 comments