Controlled Vocabularies:
Name Authority Control
University of California, Berkeley
School of Information Management and Systems
SIMS 202: Information Organization and Retrieval
Review
- Mapping to the relational model
- Database Design & Normalization
- ER Diagrams and Assignment
Normalization
Unnormalized Relations
- First step in normalization is to convert the data into a two-dimensional table
- In unnormalized relations data can repeat within a column
Unnormalized Relation
First Normal Form
- To move to First Normal Form a relation must contain only atomic values at each row and column.
- No repeating groups
- A column or set of columns is called a Candidate Key when its values can uniquely identify the row in the relation.
First Normal Form
Second Normal Form
- A relation is said to be in Second Normal Form when every nonkey attribute is fully functionally dependent on the primary key.
- That is, every nonkey attribute needs the full primary key for unique identification
Second Normal Form
Second Normal Form
Third Normal Form
- A relation is said to be in Third Normal Form if there is no transitive functional dependency between nonkey attributes
- When one nonkey attribute can be determined with one or more nonkey attributes there is said to be a transitive functional dependency.
- The side effect column in the Surgery table is determined by the drug administered
- Side effect is transitively functionally dependent on drug so Surgery is not 3NF
Third Normal Form
Third Normal Form
Joins
More on Assignment and ER
- Just what is this Cookie database?
- What sort of ways might it be used?
- What are those ER symbols again?
Original Assignment
- Examine the Cookie database using Access and look at the ER Diagram for it posted on the assignments page.
- Consider the possibilities of Book publications
- What are the problems with the database?
- What new fields would you add to the database, and where?
- Draw a new ER diagram showing your design.
Cookie ER diagram
Cookie Database
- Cookie is a bibliographic database that contains information about a hypothetical union catalog of several libraries
- There are currently 5 main types of entities in the database (and one linking relation)
- Books (bibfile)
- Local Call numbers (callfile)
- Libraries (libfile)
- Publishers (pubfile)
- Subject headings (subfile)
- Links between subject and books (indxfile)
BIBFILE
- Books (BIBFILE) contains information about particular books. It includes one record for each book. The attributes are:
- accno -- an "accession" or serial number
- author -- The author�s name
- title -- The title of the book
- loc -- Location of publication (where published)
- date -- Date of publication
- price -- Price of the book
- pagination -- Number of pages
- ill -- What type of illustrations (maps, etc) if any
- height -- Height of the book in centimeters
CALLFILE
- CALLFILE contains call numbers and holdings information linking particular books with particular libraries. Its attributes are:
- accno -- the book accession number
- libid -- the id of the holding library
- callno -- the call number of the book in the particular library
- copies -- the number of copies held by the particular library
LIBFILE
- LIBFILE contain information about the libraries participating in this union catalog. Its attributes include:
- libid -- Library id number
- library -- Name of the library
- laddress -- Street address for the library
- lcity -- City name
- lstate -- State code (postal abbreviation)
- lzip -- zip code
- lphone -- Phone number
- mop - suncl -- Library opening and closing times for each day of the week.
PUBFILE
- PUBFILE contain information about the publishers of books. Its attributes include
- pubid -- The publisher�s id number
- publisher -- Publisher name
- paddress -- Publisher street address
- pcity -- Publisher city
- pstate -- Publisher state
- pzip -- Publisher zip code
- pphone -- Publisher phone number
- ship -- standard shipping time in days
SUBFILE
- SUBFILE contains each unique subject heading that can be assigned to books. Its attributes are
- subcode -- Subject identification number
- subject -- the subject heading/description
INDXFILE
- INDXFILE provides a way to allow many-to-many mapping of subject headings to books. Its attributes consist entirely of links to other tables
- subcode -- link to subject id
- accno -- link to book accession number
Some examples of Cookie Searches
- Who wrote Microcosmographia Academica?
- How many pages long is Alfred Whitehead�s The Aims of Education and Other Essays?
- Which branches in Berkeley�s public library system are open on Sunday?
- What is the call number of Moffitt Library�s copy of Abraham Flexner�s book Universities: American, English, German?
- What books on the subject of higher education are among the holdings of Berkeley (both UC and City) libraries?
- Print a list of the Mechanics Library holdings, in descending order by height.
- What would it cost to replace every copy of each book that contains illustrations (including graphs, maps, portraits, etc.)?
- Which library closes earliest on Friday night?
Cookie ER diagram
Assignment Goal
- The main intent is to have you start thinking about how databases are structured, and what types of information can or should be included when designing a database
- The main task is to look for MISSING elements in the current design, or badly designed elements given the particular data
- What attributes and/or new relations need to be added to the database?
And now for something completely different...
Today
- Controlled vocabularies
- Choice of names
- Form of names
- Name Authority files
Controlled Vocabularies
- Vocabulary control is the attempt to provide a standardized and consistent set of terms (such as subject headings, names, classifications, etc.) with the intent of aiding the searcher in finding information.
Controlled Vocabularies
- Names and name authorities (Today)
- Cognitive basis of categorization and subject classification (Thursday)
- Design of controlled vocabularies for subject access -- Thesaurus design (next week)
Names
- Cutter�s objectives of bibliographic description:
- To enable a person to find a document of which the author is known.
- To show what the library has by a given author.
- First serves access.
- Second serves collocation.
Problems with Names
- How many names should be associated with a document?
- Which of these should be the "main entry"?
- What form should each of the names take?
- What references should be made from other possible forms of names that haven�t been used?
The problem
- Proliferation of the forms of names
- Different names for the same person
- Different people with the same names
- Examples
- from Books in Print (semi-controlled but not consistent)
- ERIC author index (not controlled)
Rules for description
- AACR II and other sets of descriptive cataloging rules provide guidelines for:
- Determining the number of name entries
- Choosing a main entry
- Deciding on the form of name to be used
- Deciding when to make references
Authority control
- Authority control is concerned with creation and maintenance of a set of terms that have been chosen as the standard representatives (also know as established) based on some set of rules.
- If you have rules, why do you need to keep track of all of the headings?
Conditions of Authorship?
- Single person or single corporate entity
- Unknown or anonymous authors
- Shared responsibility
- Collections or editorially assembled works
- Works of mixed responsibility (e.g. translations)
- Related Works
Added Entries
- Personal names
- Collaborators
- Editors, compilers, writers
- Translators (in some cases)
- Illustrators (in some cases)
- Other persons associated with the work (such as the honoree in a Festschrift).
- Corporate Names
- Any prominently named corporate body that has involvement in the work beyond publication, distribution, etc.
Choice of Name
- AACR II says that the predominant form of the name used in a particular author�s writings should be chosen as the form of name.
- References should be made from the other forms of the name.
Form of the Name
- When names appear in multiple forms, one form needs to be chosen. Criteria for choice are
- Fullness (e.g. Full names vs. initials only)
- Language of the name.
- Spelling (choose predominant form)
- Entry element:
- John Smith or Smith, John?
- Mao Zedong or Zedong, Mao? (Mao Tse Tung?)
Name Authority Files
Name Authority Files
Name authority files