recent comments

recent articles

  • The Avengers

    Almer S. Tigelaar 11 / 05 / 2012

    Marvel teased us with the release of this film near the end of various previously released super hero flicks like Captain America and Iron Man 2. This would be the movie that unites all the super heroes from the Marvel universe. Well actually, only those that had not been previously licensed to other studios. Hence, you will not find characters from X-Men, Spiderman, or the Fantastic Four in this movie. Director Joss Whedon brings back fond memories of creative television series like Firefly and Dollhouse, but what does he make of a 220 million blockbuster production?

    read more 0 comments
  • Hugo

    Almer S. Tigelaar 06 / 03 / 2012

    Hugo is based on a relatively recently released (2007) award winning book by Brian Selznick. It is not surprising that the film rights to the books were quickly sold, and certainly not by the least of directors either: Martin Scorsese. He has a career spanning decades and has directed a string of movies in recent years which I liked, among which are Shutter Island, The Departed and Gangs of New York. However, those were admittedly all in different, less family friendly, genres. So, I went to Hugo hoping to be pleasantly surprised.

    read more 0 comments
  • How long would it take to read Wikipedia?

    Almer S. Tigelaar 21 / 02 / 2012

    Wikipedia has become the de facto encyclopedia on the Internet. A traditional encyclopedia spans many textbook volumes which would take any normal person ages to read. Few people would likely engage in such an endeavor. However, since Wikipedia is readily accessible: should you take up the challenge?

    read more 0 comments

Almer S. Tigelaar » Research

Query-Based Sampling using Snippets

Almer S. Tigelaar 23 / 07 / 2010, 11:11

Query-Based Sampling using Snippets
Tigelaar, A. S. & Hiemstra, D.

In Proceedings of LSDS-IR 2010, Geneva, Switzerland (pp. 9-14).

View in Repository

Abstract
Query-based sampling is a commonly used approach to model the content of servers. Conventionally, queries are sent to a server and the documents in the search results returned are downloaded in full as representation of the server’s content. We present an approach that uses the document snippets in the search results as samples instead of downloading the entire documents. We show this yields equal or better modeling performance for the same bandwidth consumption depending on collection characteristics, like document length distribution and homogeneity. Query-based sampling using snippets is a useful approach for real-world systems, since it requires no extra operations beyond exchanging queries and search results.

Presented at the Large-Scale Distributed Systems for Information Retrieval Workshop on July 23rd in Geneva, Switzerland.

sigir2010-13
More Photos

Large-Scale and Distributed Systems for Information Retrieval Workshop Logo

More in Research: