Known-item search" and "Subject search". A traditional over-simplification
A Definition: "Information Storage and Retrieval. A generic term for activities, usually using computers, in which data
of some sort are stored in an organized way so that they may be recovered
in response to enquiries. The expression is used for two quite distinct activities. In one (sometimes
known as data retrieval) the complexity arises from the detailed structure of the data and from their bulk,
all enquiries being unambiguous as are the encodings of the data. In the other (sometimes known as
document retrieval or reference retrieval) the complexity arises from the impossiblity of describing the
content of a document, or the intent of a request, precisely or unambiguously. In the first case the
difficult question is "What is the thing I am looking for?" and in the second "Is this thing the one I am
looking for?" R. M. Needham. Encyclopedia of Modern Thought, 1977.
Note: The word "information" does not appear. A clear distinction between what the Information
Systems people worry about (data retrieval) and what the Information Science people worry about
(document or reference retrieval).
TAXONOMY OF SEARCHES. How to formulate the query? How to re-formulate the query? and,
When to stop searching?
Types of search I: Instances.
"Every instance": When every different document with the specified characteristics has been found.
Usually instances of different documents (types) rather than with duplicate copies (tokens) of the same
document (because redundant).
"Census search": Every copy (token) of each different document (type).
"Any single instance" of a document with the specified characteristics has been found.
"Any N instances" when any N instances of documents with the specified characteristics.
"Extreme instance": for extreme value for one or more attributes, e.g. the most recent.
Types of search II: "Good" (or "preferred") documents. Conventional Boolean search systems in
operate on primitive, unambiguous binary distinctions. In real life searchers would like one or a few
"good" documents, rather than just any. Searches for "good" books are characterized by preferences,
Complex specifications are inconvenient to formulate and searchers have a low tolerance for complexity
in search specification. Empirical studies of online library catalog usage have consistently shown that
functionality for specifying complex searches is little used. Searches for "some good" documents are
characterized by preferences, e.g. a three-fold approach to attribute values:
- Required (i.e. mandatory);
- Conditional ("Given a choice,..."); and
- Indifference.
Adaptive searches for "n good" instances take effect situationally, adaptively. The more difficult it is to
predict the outcome of searches, the more desirable it is to develop
systems that support and encourage adaptive search strategies.
Sameness and substitutability: No such thing as "the same". Two or more objects are equally
acceptable for some purpose.
SEARCH THEORY.
Likelihood of finding: Look first at the source most likely to contain what is
sought.
Cost of searching: Start with the source that is the least expensive to search.
Cost-effective
searching: Searched in decreasing order of the expected search
success / search cost ratio.
Stopping the
search: Compare the marginal cost-benefit with some other, alternative
use of resources.
Search diseconomy: Satisficing; Mooers' Law; elasticity of demand.