SIMS 202 Information Organization and Retrieval  

Assignment 5: IR Evaluation 


Assigned 10/05. Due 10/12.
Reading: Modern IR, Chapter 3

The goal of this assignment is to give you experience with computing and assessing IR evaluation measures. Note that there are two questions.

(1) Say you are an analyst asked to decide which of three systems to choose for your client. Along with cost factors and assessment of the user interface, you want to assess the relative strengths of the systems' ranking algorithms (called Bear, Cardinal, and Wolf).

Assume you know, for a document collection and a set of queries, what all the relevant documents in the collection are. You only have binary (yes/no) relevance judgements for each (query, document) pair. You also know the order in which the systems rank the documents. This information appears on the next two pages (and can be found online in an excel spreadsheet)

(1) For each system, compute the average precision (over all queries) at recall intervals of 20%. (That is, precision at 20% recall, 40% recall,..., 100% recall.) Show the results in a table, and also graph the results for all three systems on one plot (connect the points for each system with a smooth line).\footnote{Don't worry about how to interpolate between the points.} You can either draw the graph by hand or use a program to help you.

(2) Now compute the average precision (over all queries) at four different document cutoff levels, showing the results in a table. Choose cutoff levels that help facilitate comparison of the three systems.

(3) Based on these results, which ranking algorithm do you recommend?

The table below shows relevance judgements for 25 documents for three queries. 1 indicates relevant, 0 indicates not relevant.

DocId

Query0

Query1

Query2

 

 

 

 

1

1

0

0

2

1

0

0

3

1

0

0

4

1

0

0

5

1

0

0

6

0

0

0

7

0

0

0

8

0

1

0

9

0

1

0

10

0

1

0

11

0

1

0

12

0

1

0

13

0

0

0

14

0

0

1

15

0

0

1

16

0

0

1

17

0

0

1

18

0

0

1

19

0

0

0

20

0

0

0

21

0

0

0

22

0

0

0

23

0

0

0

24

0

0

0

25

0

0

0

Below is shown the ordering of 25 documents for each of the three queries by each of the three ranking algorithms. Documents are identified by document ID. The document shown on row 1 is ranked highest, that on row 2 is the second-highest ranked, etc. Assume there are no ties.

 

Query 0

Query 1

Query 2

 

 

Bear

  Card  

Wolf  

Bear

  Card  

Wolf  

Bear

  Card  

Wolf  

 

1

5

7

11

12

16

7

1

5

23

2

2

16

2

14

23

23

18

24

9

3

1

9

10

8

3

3

11

13

25

4

15

18

21

23

13

13

16

19

18

5

9

4

1

9

4

4

19

17

1

6

19

17

5

5

14

8

10

12

2

7

3

8

22

20

17

17

23

22

11

8

6

5

9

21

11

25

21

4

10

9

18

11

20

4

7

16

3

1

19

10

4

15

3

19

21

21

6

8

16

11

12

12

23

3

18

18

22

11

12

12

8

10

4

10

10

22

12

3

22

13

11

6

19

7

19

19

17

16

15

14

17

2

6

18

22

10

2

23

8

15

7

19

15

11

20

20

13

25

24

16

20

20

25

17

1

5

24

18

14

17

10

21

18

24

2

2

25

2

7

18

21

25

16

13

12

12

14

20

13

19

16

22

7

2

15

15

5

14

17

20

23

1

8

16

5

1

4

10

20

21

13

23

17

6

8

14

7

6

21

22

22

13

12

22

6

24

9

21

4

23

24

24

13

15

9

9

15

15

5

24

25

3

14

25

25

11

20

7

6

25

14

14

24

1

24

6

8

9

3

(2) (This question should be at most 3 paragraphs total.)
Web search is different than standard (traditional) IR. Name one way it differs and describe a simple evaluation metric for comparing web search engines' results.