Optional Extra Credit Assignment 2: IR Evaluation
(Up to 30 extra points may be awarded for successful
completion of this assignment)
(1) Say you are an analyst asked to decide which of three systems to choose for your client. Along with cost factors and assessment of the user interface, you want to assess the relative strengths of the systems' ranking algorithms (called Bear, Cardinal, and Wolf).
Assume you know, for a document collection and a set of queries, what all the relevant documents in the collection are. You only have binary (yes/no) relevance judgements for each (query, document) pair. You also know the order in which the systems rank the documents. This information appears on the next two pages (and can be found online in an excel spreadsheet)
(a) For each system, compute the average precision (over all queries) at recall intervals of 20%. (That is, precision at 20% recall, 40% recall,..., 100% recall.) Show the results in a table, and also graph the results for all three systems on one plot (connect the points for each system with a smooth line).\footnote{Don't worry about how to interpolate between the points.} You can either draw the graph by hand or use a program to help you.
(b) Now compute the average precision (over all queries) at four different document cutoff levels, showing the results in a table. Choose cutoff levels that help facilitate comparison of the three systems.
(c) Based on these results, which ranking algorithm do you recommend?
The table below shows relevance judgements for 25 documents for three queries. 1 indicates relevant, 0 indicates not relevant.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Below is shown the ordering of 25 documents for each of the three queries by each of the three ranking algorithms. Documents are identified by document ID. The document shown on row 1 is ranked highest, that on row 2 is the second-highest ranked, etc. Assume there are no ties.
|
|
|
|
||||||
|
|||||||||
|
Bear
|
Card
|
Wolf
|
Bear
|
Card
|
Wolf
|
Bear
|
Card
|
Wolf
|
|
|||||||||
1
|
5
|
7
|
11
|
12
|
16
|
7
|
1
|
5
|
23
|
2
|
2
|
16
|
2
|
14
|
23
|
23
|
18
|
24
|
9
|
3
|
1
|
9
|
10
|
8
|
3
|
3
|
11
|
13
|
25
|
4
|
15
|
18
|
21
|
23
|
13
|
13
|
16
|
19
|
18
|
5
|
9
|
4
|
1
|
9
|
4
|
4
|
19
|
17
|
1
|
6
|
19
|
17
|
5
|
5
|
14
|
8
|
10
|
12
|
2
|
7
|
3
|
8
|
22
|
20
|
17
|
17
|
23
|
22
|
11
|
8
|
6
|
5
|
9
|
21
|
11
|
25
|
21
|
4
|
10
|
9
|
18
|
11
|
20
|
4
|
7
|
16
|
3
|
1
|
19
|
10
|
4
|
15
|
3
|
19
|
21
|
21
|
6
|
8
|
16
|
11
|
12
|
12
|
23
|
3
|
18
|
18
|
22
|
11
|
12
|
12
|
8
|
10
|
4
|
10
|
10
|
22
|
12
|
3
|
22
|
13
|
11
|
6
|
19
|
7
|
19
|
19
|
17
|
16
|
15
|
14
|
17
|
2
|
6
|
18
|
22
|
10
|
2
|
23
|
8
|
15
|
7
|
19
|
15
|
11
|
20
|
20
|
13
|
25
|
24
|
16
|
20
|
20
|
25
|
17
|
1
|
5
|
24
|
18
|
14
|
17
|
10
|
21
|
18
|
24
|
2
|
2
|
25
|
2
|
7
|
18
|
21
|
25
|
16
|
13
|
12
|
12
|
14
|
20
|
13
|
19
|
16
|
22
|
7
|
2
|
15
|
15
|
5
|
14
|
17
|
20
|
23
|
1
|
8
|
16
|
5
|
1
|
4
|
10
|
20
|
21
|
13
|
23
|
17
|
6
|
8
|
14
|
7
|
6
|
21
|
22
|
22
|
13
|
12
|
22
|
6
|
24
|
9
|
21
|
4
|
23
|
24
|
24
|
13
|
15
|
9
|
9
|
15
|
15
|
5
|
24
|
25
|
3
|
14
|
25
|
25
|
11
|
20
|
7
|
6
|
25
|
14
|
14
|
24
|
1
|
24
|
6
|
8
|
9
|
3
|
(2) (This question should be at most 3 paragraphs total.)
Web search is different than standard (traditional) IR. Name one way in which
it differs and describe a simple evaluation metric for comparing web search
engines' results.