Shard Ranking and Cutoff Estimation for Topically Partitioned Collections

Large document collections can be partitioned into topical shards to facilitate distributed search. In a low-resource search environment only a few of the shards can be searched in parallel. Such a search environment faces two intertwined challenges. First, determining which shards to consult for a given query: shard ranking. Second, how many shards to consult from the ranking: cutoff estimation. In this paper we present a family of three algorithms that address both of these problems.

Peer-to-Peer Information Retrieval

The Internet has become an integral part of our daily lives. However, the essential task of finding information is dominated by a handful of large centralised search engines. In this thesis we study an alternative to this approach. Instead of using large data centres, we propose using the machines that we all use every day: our desktop, laptop and tablet computers, to build a peer-to-peer web search engine.

Designing a Thesis

A while ago I spent quite some time to research the best options for designing my thesis. I used ideas from various sources, and in this brief article I will explain some of the choices I made, which will hopefully be useful to those that still need to complete their own thesis.