th those court cases include the refereed papers and posters offered on the 28 Annual eu convention on info Retrieval (ECIR 2006), which used to be held at Imperial collage London in South Kensington among April 10 and 12, 2006. ECIR is the yearly convention of the British machine Society’s Inf- mation Retrieval professional workforce. the development begun its existence as a colloquium in 1978 and was once held within the united kingdom every year till 1998, while the development happened in Grenoble, France. given that then the venue has alternated among the united kingdom and Continental Europe. within the final decade ECIR has grown to turn into the foremost Europeanforumforthediscussionofresearchinthe?eldofinformationretrieval. ECIR 2006 got 177 paper and seventy three poster submissions, mostly from the united kingdom (18%) and Continental Europe (50%), yet we had many sub- missions from furthera?eldincludingAmerica(7%),Asia(21%),Middle EastandAfrica(2%), and Australasia (2%). In overall 37 papers and 28 posters have been permitted, and papers have been switched over to posters. All contributions have been reviewed by way of at the least 3 reviewers in a double nameless strategy after which ranked in the course of a ProgrammeCommittee assembly with respectto scienti?c caliber andoriginality. it's a reliable and fit signal for info retrieval ordinarily, and ECIR particularly, that the submission cost has greater than doubled during the last 3 years. the disadvantage, after all, is that many top of the range submissions needed to be rejected because of a constrained skill of the conference.

Note that the number of documents within each topic is different and some topics contain even less than 5 documents, so its corresponding precisions may be low. But these circumstances do not affect the comparison of the performance for different measures. 1 Similarity Measure Comparison The results of MAP, P@10 and P@20 for different similarity measures are shown and compared in Figure 1. For the PTD-based measure and the OM-based measure, the performance is dependent on the document decomposition algorithm, so we plot the highest precisions they achieve based on the TextTling algorithm.

The document is a finite binary sequence of Bernoulli trials whose outcome can be either a success, that is an occurrence of the term, or a failure, that is an occurrence of a different term. To be more precise, we also assume that the finite binary sequence is random, that is any trial is statistically independent from its preceding trials. In a Bernoulli process the probability of a given sequence is P(tf|d, p) = ptf · (1-p)l(d)−tf where p is the probability of occurrence of the term. l(d) There are of exchangeable sequences (in IR they are also called a bag of tf words), therefore the probability is given by the binomial 16 G.

3 The PTD-Based Measure Giannopoulos and Veltkamp [6] propose the Proportional Transportation Distance (PTD) in order to get a similarity measure based on weight transportation such that the surplus of weight between two point sets in taken into account and the triangle inequality still holds. The PTD evaluates dissimilarity between two weighted point sets where a distance measure between single points, which we call the ground distance is given. The PTD “lifts” this distance from individual points to full sets.

