Controlled Vocabularies:
Name Authority Control
University of California, Berkeley
School of Information Management and Systems
SIMS 202: Information Organization and Retrieval
Review
Mapping to the relational model
Database Design & Normalization
ER Diagrams and Assignment
Normalization
Unnormalized Relations
First step in normalization is to convert the data into a two-dimensional table
In unnormalized relations data can repeat within a column
Unnormalized Relation
First Normal Form
To move to First Normal Form a relation must contain only atomic values at each row and column.
No repeating groups
A column or set of columns is called a Candidate Key when its values can uniquely identify the row in the relation.
First Normal Form
Second Normal Form
A relation is said to be in Second Normal Form when every nonkey attribute is fully functionally dependent on the primary key.
That is, every nonkey attribute needs the full primary key for unique identification
Second Normal Form
Second Normal Form
Third Normal Form
A relation is said to be in Third Normal Form if there is no transitive functional dependency between nonkey attributes
When one nonkey attribute can be determined with one or more nonkey attributes there is said to be a transitive functional dependency.
The side effect column in the Surgery table is determined by the drug administered
Side effect is transitively functionally dependent on drug so Surgery is not 3NF
Third Normal Form
Third Normal Form
Joins
More on Assignment and ER
Just what is this Cookie database?
What sort of ways might it be used?
What are those ER symbols again?
Original Assignment
Examine the Cookie database using Access and look at the ER Diagram for it posted on the assignments page.
Consider the possibilities of Book publications
What are the problems with the database?
What new fields would you add to the database, and where?
Draw a new ER diagram showing your design.
Cookie ER diagram
Cookie Database
Cookie is a bibliographic database that contains information about a hypothetical union catalog of several libraries
There are currently 5 main types of entities in the database (and one linking relation)
Books (bibfile)
Local Call numbers (callfile)
Libraries (libfile)
Publishers (pubfile)
Subject headings (subfile)
Links between subject and books (indxfile)
BIBFILE
Books (BIBFILE) contains information about particular books. It includes one record for each book. The attributes are:
accno -- an "accession" or serial number
author -- The author’s name
title -- The title of the book
loc -- Location of publication (where published)
date -- Date of publication
price -- Price of the book
pagination -- Number of pages
ill -- What type of illustrations (maps, etc) if any
height -- Height of the book in centimeters
CALLFILE
CALLFILE contains call numbers and holdings information linking particular books with particular libraries. Its attributes are:
accno -- the book accession number
libid -- the id of the holding library
callno -- the call number of the book in the particular library
copies -- the number of copies held by the particular library
LIBFILE
LIBFILE contain information about the libraries participating in this union catalog. Its attributes include:
libid -- Library id number
library -- Name of the library
laddress -- Street address for the library
lcity -- City name
lstate -- State code (postal abbreviation)
lzip -- zip code
lphone -- Phone number
mop - suncl -- Library opening and closing times for each day of the week.
PUBFILE
PUBFILE contain information about the publishers of books. Its attributes include
pubid -- The publisher’s id number
publisher -- Publisher name
paddress -- Publisher street address
pcity -- Publisher city
pstate -- Publisher state
pzip -- Publisher zip code
pphone -- Publisher phone number
ship -- standard shipping time in days
SUBFILE
SUBFILE contains each unique subject heading that can be assigned to books. Its attributes are
subcode -- Subject identification number
subject -- the subject heading/description
INDXFILE
INDXFILE provides a way to allow many-to-many mapping of subject headings to books. Its attributes consist entirely of links to other tables
subcode -- link to subject id
accno -- link to book accession number
Some examples of Cookie Searches
Who wrote Microcosmographia Academica?
How many pages long is Alfred Whitehead’s The Aims of Education and Other Essays?
Which branches in Berkeley’s public library system are open on Sunday?
What is the call number of Moffitt Library’s copy of Abraham Flexner’s book Universities: American, English, German?
What books on the subject of higher education are among the holdings of Berkeley (both UC and City) libraries?
Print a list of the Mechanics Library holdings, in descending order by height.
What would it cost to replace every copy of each book that contains illustrations (including graphs, maps, portraits, etc.)?
Which library closes earliest on Friday night?
Cookie ER diagram
Assignment Goal
The main intent is to have you start thinking about how databases are structured, and what types of information can or should be included when designing a database
The main task is to look for MISSING elements in the current design, or badly designed elements given the particular data
What attributes and/or new relations need to be added to the database?
And now for something completely different...
Today
Controlled vocabularies
Choice of names
Form of names
Name Authority files
Controlled Vocabularies
Vocabulary control is the attempt to provide a standardized and consistent set of terms (such as subject headings, names, classifications, etc.) with the intent of aiding the searcher in finding information.
Controlled Vocabularies
Names and name authorities (Today)
Cognitive basis of categorization and subject classification (Thursday)
Design of controlled vocabularies for subject access -- Thesaurus design (next week)
Names
Cutter’s objectives of bibliographic description:
To enable a person to find a document of which the author is known.
To show what the library has by a given author.
First serves access.
Second serves collocation.
Problems with Names
How many names should be associated with a document?
Which of these should be the "main entry"?
What form should each of the names take?
What references should be made from other possible forms of names that haven’t been used?
The problem
Proliferation of the forms of names
Different names for the same person
Different people with the same names
Examples
from Books in Print (semi-controlled but not consistent)
ERIC author index (not controlled)
Rules for description
AACR II and other sets of descriptive cataloging rules provide guidelines for:
Determining the number of name entries
Choosing a main entry
Deciding on the form of name to be used
Deciding when to make references
Authority control
Authority control is concerned with creation and maintenance of a set of terms that have been chosen as the standard representatives (also know as established) based on some set of rules.
If you have rules, why do you need to keep track of all of the headings?
Conditions of Authorship?
Single person or single corporate entity
Unknown or anonymous authors
Shared responsibility
Collections or editorially assembled works
Works of mixed responsibility (e.g. translations)
Related Works
Added Entries
Personal names
Collaborators
Editors, compilers, writers
Translators (in some cases)
Illustrators (in some cases)
Other persons associated with the work (such as the honoree in a Festschrift).
Corporate Names
Any prominently named corporate body that has involvement in the work beyond publication, distribution, etc.
Choice of Name
AACR II says that the predominant form of the name used in a particular author’s writings should be chosen as the form of name.
References should be made from the other forms of the name.
Form of the Name
When names appear in multiple forms, one form needs to be chosen. Criteria for choice are
Fullness (e.g. Full names vs. initials only)
Language of the name.
Spelling (choose predominant form)
Entry element:
John Smith or Smith, John?
Mao Zedong or Zedong, Mao? (Mao Tse Tung?)
Name Authority Files
Name Authority Files
Name authority files