L24. STRUCTURE-BASED MODELS [2] (11/23)

23 November 2009

Documents aren't just bags of words; they can have a great deal of internal structure and content encoding. But most IR models don't use anything other than document-level statistics about term occurrence. The use of XML for encoding document models and instances shows where structure can be used to great advantage in IR to add value beyond text retrieval.  We can express queries about document structures (for example, to find all articles written after June 1, 2008 with the words "presidential election" in the title field) and use internal structure to return only the precise parts of large documents that satisfy the query.

 

Download recorded lecture from http://courses.ischool.berkeley.edu/i202/f09/files/202-20091123.mp3