Search enginesThe "perfect" search engine
"The ideal search engine, in our view, should be able to take a natural language phrase and find the most relevant information without expecting users to master Boolean or other structured logic.
InfoSeek Guide was the best in our tests at understanding phrases and finding relevant information, largely because it combines a full-text index, a word stemmer, proximity searches, and case-sensitivity. Unfortunately, its database is too small to make it the only search tool you'll need.
The other hallmark of the perfect search engine would be to find every bit of information described by the query. Because of the extensiveness of its database (roughly 20 million pages), Alta Vista excels at finding obscure bits of information virtually anywhere on the Web…Unfortunately, when searching for anything more complex than a simple phrase, you'll need to construct your own Boolean statement and the quality of your results will be dependent on your skill at constucting queries. "
Gus Venditto, "Search Engine Showdown", Internet World, May 1996
http://www.internetworld.com/print/monthly/1996/05/showdown.htmlOptions
Search engines (SE), in the broadest sense, require access to a frequently updated and indexed database. This means, basically, that you have to build something, which is what all the commercial SEs do for you—build a database with an index by constant crawling.
External to Suffragists Speak Web searchers
Suffragists Speak site-only searchers
- Swish-E (http://sunsite.berkeley.edu/SWISH-E/), i.e., SWISH-Enhanced, is a fast, powerful, flexible, and easy to use system for indexing collections of Web pages or other text files. Key features include the ability to limit searches to certain HTML tags (META, TITLE, comments, etc.). The SWISH-E software is free, and includes a package of Perl programs that enable anyone who is authorized to create and maintain their own indexes (AutoSwish). The current release of SWISH-E for UNIX is 1.2. The current release of AutoSwish for UNIX is 1.1. See the release notes for information on changes made to version 1.1. There is information on SWISH-E for other platforms. A different version of SWISH is also available from another developer as SWISH++.
- Harvest (http://harvest.transarc.com/afs/transarc.com/public/trg/Harvest/) is a UNIX-based, free engine available in release 1.5 with unique technology that maximizes performance. Also, although it is not clear to me, I think that by being a registered user, you have access to other Harvest indexes???
- WebGlimpse (http://glimpse.cs.arizona.edu/webglimpse/), from the University of Arizona, is also free. It operates by attaching a small search box to the bottom of every HTML page, and allows the search to cover the neighborhood of that page or the whole site. "With WebGlimpse there is no need to construct separate search pages, and no need to interrupt the users from their browsings. All pages remain unchanged except for the extra search capabilities. It is even possible for the search to efficiently cover remote pages linked from your pages. (WebGlimpse will collect such remote pages to your disk and index them.) Installation, customization (e.g., deciding which pages to collect and which ones to index), and maintenance are easy." Keyword here is still "maintenance".
- ht://Dig (http://htdig.sdsu.edu/) is a "complete world wide web indexing and searching system for a small domain or intranet. This system is not meant to replace the need for powerful internet-wide search systems like Lycos, Infoseek, Webcrawler and AltaVista. Instead it is meant to cover the search needs for a single company, campus, or even a particular sub section of a web site. As opposed to some WAIS-based or web-server based search engines, ht://Dig can span several web servers at a site. The type of these different web servers doesn't matter as long as they understand the HTTP 1.0 protocol. ht://Dig was developed at San Diego State University as a way to search the various web servers on the campus network.
- Verity (http://www.verity.com/) is not free, but is an example of power. Their website says that they are "a leading provider of enterprise knowledge retrieval solutions for corporate intranets, online publishers and OEMs and ISVs. Verity's product suite enables organizations to turn corporate intranets into a powerful knowledge base, making business information accessible and reusable across the enterprise. Verity's comprehensive and integrated product family enables enterprise-wide document indexing, classification, search and retrieval, personalized information dissemination, and hybrid online and CD publishing all from the same underlying Verity collection. Verity's KeyView products also enable viewing of source documents stored in more than 200 formats. Some of Verity’s partners (?what does that mean—investors, db providers?) include Adobe Systems, AT&T, CNET, Cisco, Compaq, Dow Jones, Ernst & Young, Financial Times, NewsEDGE Corporation, Informix, Lotus, NEC, Netscape Communications, SAP, Siemens Nixdorf, Sybase, Tandem and Time Warner's Pathfinder.
Multisite searchers can reach, for example, the seven major search engines, two super-directories, and the wide assortment of on-line encyclopedic databases.
- Freefind (http://www.freefind.com/) is a free searcher that can be added to a site. Visitors to the "site click the search link. They see a search entry form that looks like part of your site because it can be customized with your own background and logo. The user enters a query and presses the search button. The Freefind search engine will search your web site and present your visitor with a results page. This page contains a ranked list of pages on your site that match the user's query. The results page can be customized with your own logo and background.
Local (install on PC—possible to intall on server?)
- Quarterdeck Corp.'s WebCompass Personal Edition is a search agent that broadcasts your query to Lycos, Yahoo, WebCrawler, and Open Text in one operation. (you can download evaluation copies from http://www.quarterdeck.com; purchase price is $40). In the CD-ROM version, you see only search results from each site, while the Personal Edition also returns ads that appear at sites.
Online
- Squrl (an acronym for Search and Query Uniform Resource Locator) from Blue Squirrel aims at doing pretty much the same job as WebCompass, but offers the software on a mix-and-match basis. You can buy modules for Alta Vista, Excite, Lycos, Open Text, and a number of Web directories at $5 each, or collect the entire set for $49. You can store Squrl results in an HTML page, which makes it possible to develop a self-updating Web site that tracks a topic.
- NlightN allows you to perform any search using its meta-engine for free. However, if the results are in one of its many proprietary databases, you'll have to charge it to an NlightN account before you can see the full text. If your results are in the service's free databases, you can see it at no charge. Charges for individual articles begin at 10 cents. Knight-Ridder News Service, the National Library of Medicine, and Business Wire are among the databases offered by NlightN. While NlightN offers the Lycos search engine as part of its free service, you don't actually see the Lycos database when you use NlightN; instead, you see a text summary of the Lycos results.
- IBM is in the process of building InfoMarket as an information mecca. Right now, you can register for free access to a variety of Web search tools and database searches, but IBM says it will begin to charge for the services at some unspecified time in the future. A search on InfoMarket encompasses Open Text, Yahoo, Hoover's Business Resources, Newsbytes (a computer-industry news service), Usenet News, the CIA World Factbook, and more. You can choose to have results displayed either by relevance score or grouped by source. InfoMarket searches are fast--equivalent to a search on just one of the databases on its own.