Arti Kirch
SIMS 290 E-Publishing
10/5/98 Assignment: Project Research
 
 

Note: The group I am part of-Suffragists Speak-is researching technology that we feel our product should have, but that we don't necessarily have deep background in. I am investigating search engines: due to the educational basis of our product, it should support inquiry, not just present close-ended information packets. Thus, the articles below are a sample of the issues that seem the most interesting or relevant to what we are trying to do.
 
 

Deborah Lynne Wiley, "Beyond Information Retrieval: Ways to Provide Content in Context", for Database, August 1998.
http://www.onlineinc.com/database/DB1998/wiley8.html

The article discusses how the Web and its technologies have raised the expectations of information searchers. It is no longer sufficient to produce a list of thousands of items in response to a query that requires the requestor to identify which items meet their needs. In other words, the growing population of searchers wants answers, not numbers.

A brief discussion follows on the pre-Web quarter century of providing information which tended to emphasize raw Boolean searching, large data stores, and high prices-all of which contributed to keeping searching within a professional milieu. However, all of that changed in the 90's as computing and data storage costs plummeted and the Web made networking the norm.

Possibly the most successful search model to have emerged on the Web is the directory service popularized by Yahoo! Its success has helped "search engines...[recognize] the limits of the massive quantity and lack of quality of information on the Web. Hence, they are preparing a number of strategies for adding 'editorial context' to the data."

The article then enumerates some methods for creating value-added information. The activities that seem to have the most meaning for our product are:

Collaborative Filtering - this method provides "recommendations to a user based on what other users have done". Given that we want to support users who may be unfamiliar with women's suffrage or with history in general, providing some tips to other sources might make using/reusing the product more inviting.

Pattern Recognition - in this advanced feature, "the software uses small pieces of less accurate information that, combined together, give increasing precision. It operates by calculating the probability of seeing x if we see y, and then what is the probability of seeing z if both x and y are present, and so on." This method would seem useful for all types of users, essentially allowing anyone to get as refined as they wish while providing them information along the way to spark further inquiry.

Classifying and Clustering - "The important feature is to identify the key concepts within a document, then pull all the information on those topics together, displaying it in a way that the user understands." In our original proposal we wanted to create and open source metadata for use in any similar site. If we continue with that in our prototype, our own site will become more searchable using these algorithms.

Clifford Lynch, "The Internet: Bringing Order from Chaos", Scientific American, March 1997
http://www.sciam.com/0397issue/0397intro.html

This article intrigued me for the technology it suggested to address the author's point, which is that the Web is not yet a digital library. It "was not designed to support the organized publication and retrieval of information, as libraries are...The ephemeral mixes everywhere with works of lasting importance."

This point is not lost on our product as, unless our search feature returns information, Suffragists Speak may become just another curiosity of little educational value.

The problem could be solved by building/buying our own crawler. However, apart from the issue of needing to maintain the crawler and the resulting index (features not in our business model), "the Web...still lacks standards that would facilitate automated indexing." Further, given that our site is multi-media intense and users might want to search out other multi-media, "[a]nother drawback of automated indexing is that most search engines recognize text only... no program can deduce the underlying meaning and cultural significance of an image (for example, that a group of men dining represents the 'The Last Supper')."

Mr. Lynch then suggests the Harvest "gatherer", which also, upon examination of their web-site, is open source. Harvest "lets a Web site compile indexing data for the pages it holds and to ship the information on request to the Web sites for the various search engines." An obvious strength to this engine could be that it will support building a collection on "specific topics for specific uses and tie them loosely together so people can search and locate what they want.'' I am looking into this UNIX-only application.
 
 

Gus Venditto, "Search Engine Showdown", Internet World, May 1996
http://www.internetworld.com/print/monthly/1996/05/showdown.html

I was introduced to searching basics last year in IS202, but I wanted an article that refreshed my memory in order to see if any of the commercial solutions had anything our product should consider. What follows are excerpts of the article's review of seven search engines.

Alta Vista

Excite Infoseek Lycos OpenText WebCrawler The article ends by offering suggestions for the ideal search engine. This discussion only higlighted for me that we need to think about what features we want to offer, i.e., depth, breadth, or maybe both.