SIMS
202 Information Organization and Retrieval
Assignment authors: Marti Hearst and Ray Larson
Optional Extra Credit Assignment 1:
Metadata
(Up to 40 extra points may be awarded
for successful completion of this assignment)
Must be turned in on or before December
6, 2002
Readings: Holland et al., WordNet chapters, Bates
'88, Svenonius, Pitts-Moultis and Kirk, OI chapters 1-8.
The goal of this assignment is to give you hands-on
experience with three aspects of metadata:
-
Metadata coding (using XML and Dublin
Core)
-
Organizing information via metadata,
distinguishing hierarchical versus faceted organizations
-
Lexical relations and their relationship
to conceptual relations, via WordNet.
(1) XML Data DTD Construction and Dublin Core Description
This assignment is intended to help introduce you
to XML DTD design and also to help you think about and carry out descriptions
of network and other information resources using the Dublin Core metadata
framework. Information on the Dublin Core descriptive elements and their
use can be found at http://purl.oclc.org/metadata/dublin_core/.
Discussion of XML DTDs and their elements can be found in the reader, and
also via the Robin
Cover SGML/XML web site. Note that not all items will require all descriptive
elements.
The assignment has three parts:
-
(A) Create an XML DTD that has all 15 elements of the
Dublin Core as XML elements.
-
(B) Create Dublin Core descriptions for each of the
items listed below.
-
(C) Format your answers as well-formed XML documents
using your DTD.
The items that you are to describe are:
(2) Hierarchical Versus Faceted Classification
Say you are making a classification system to describe
recipes. Assume your classification has to include the following dishes:
- Broccoli and Potato Soup
- Chicken Curry
- Grilled Salmon
- Pad Thai (spicy noodles, shrimp, and vegetables)
- Paella (spanish rice, sausage, and chicken)
- Chocolate Cheesecake
- Buttered Baby Carrots
- Strawberry Milkshake
Assume also that following types of metadata have been
assigned to each recipe:
- Main Ingredient (e.g., Pasta, Poultry, Cheese,
Chocolate, Seafood ...)
- Cuisine (e.g., African, American, Asian ...)
- Preparation Method (e.g., Bake, Broil, Quick,
Steam ...)
- Season/Occasion (e.g., Autumn, Picnic, Thanksgiving
...)
- Course/Dish (e.g., Appetizers, Dessert, Soup,
Snack ...)
Also, some of the categories in the Main Ingredient facet
have hierarchical structure. For example, under Seafood are the subcategories
Fish and Shellfish, and under Poultry are the subcategories Chicken, Turkey,
and Duck. (Feel free to add more hierarchical structure to other categories
as necessary.)
(A) Create a faceted classification system that can
be used to describe these dishes. Show how each dish can be classified
within the classification system. Your design must take into account
the fact that at least one facet is hierarchical (Main Ingredient). You
don't have to show the entire classification system; only those parts
that are needed to classify the dishes shown. Briefly explain/justify
your classification system.
(B) Create a hierarchical classification system that
can be used to describe these dishes, and show how the dishes can be classified
within the system, using all of the types of metadata shown. You don't
have to show the entire hierarchy; only those parts that are needed to
classify the dishes shown. Briefly explain/justify your classification
system.
(3) Lexical Relations and Conceptual Relations using
WordNet
Use the WordNet
HTML forms interface (or any of the other interfaces available to explore
a bit of WordNet.
(A) Select a noun that has at least two synsets assigned
to it. Draw a sketch of links to/from this synset to other synsets. For
at least two senses of the selected word, show at least one hypernym, one
hyponym, and one other type of lexical relation.
(B) Name a conceptual relation that can associate
a concept with the word you chose but which is not a valid WordNet lexical
relation. For example, for the word tennis, a conceptual relation
that might hold between it and racquet is instrument. (Do
not
use instrument as your example!)