Augsburg Logo

Searching the Internet

This lesson will discuss various strategies and tools for searching the Internet. This is a crowded field, with exotic names such as AltaVista, Lycos, Dogpile, infoSeek, and Excite, to name a few. Hopefully this lesson will give you a flavor of the myriad of tools (usually called "search engines") and strategies available for searching the Internet. We will start with basics of searching.

Searching the World Wide Web, Usenet, and More: Some General Tips for Searching

First, search engines typically use Boolean logical operators to do their searching. Basically, this means that you can use specific words to set up a fairly sophisticated search that can be precisely defined. Search engines recognize the operators "and," "not," "or," and parentheses. I'll explain each of these briefly below:

And - this allows you to search for items or directories that include two or more terms of interest. Both terms will have to be in the document or directory title to be included in your search results. An example of an "and" search is the following:

Words to search for: utah and weather

(Note: Most search engines will assume an "and" if you do not place an operator between the two words - "utah weather" will return the same search results).

Not - this operator allows you to exclude a term from the items you are searching for. You might wish to locate documents on foreign language programs except for ones involving Spanish. This search would look like the following:

Words to search for: language programs not spanish

Or - the "or" operator should only be used rarely because it will return items that include _either_ of the terms you enter. For instance, the following search would give you a huge number of results:

Words to search for: ibm or mac

Parentheses - Parentheses allow you to do rather complex searches by breaking your search into separate elements. For instance, say you want to find items on travel in either Spain or Portugal. Your search would look like this:

Words to search for: travel and (spain or portugal)

There are several combinations of this type that can be made. My advice is to avoid making your search too complex. Start with a fairly complex search, and then break it down until you get the results you need.

Second, search engines allow for word truncation. This means that you can enter a word root as a search term and then add an asterisk at the end (sometimes the asterisk is not required). The search tool will search for any word that begins with that root. For example, the search term "librar*" would return items with the following words:

library

libraries

librarian

librarians

Truncation can be quite useful, particularly with the issue of plurals and singulars. Say you were interested in items on "librarians" or "librarian." You could enter "librarian*" your search term and get items with either term in them.

Third, typically you can control how many results or "hits" a search engine will return. A standard search will be default to no more than 50 or 100 items. You can usually modify this parameter via a pop-up menu or dialog box at the site.

How they Work

In the past few years, many very sophisticated web-based search tools have been developed which can search web pages, Usenet news postings, electronic phonebooks and more. These search engines are essentially databases in which computer users may search by asking questions, called "queries." A partial list of some popular search engines and indices can be found below. As useful and powerful as many of these search sites are, they are not as comprehensive or as useful as they might appear.

A recent Associated Press report cited a study of 11 of the largest search sites. The study found that even the most powerful sites only cover about one-sixth of the web pages available on the Internet at any particular time and that the time it takes them to list new sites (i.e., find and index) is growing. The study found that on average it takes a new web page up to six months to make it into a search engine's listings. Also, the various search engines vary widely in their quality, speed, and ease of use.

In general, search engines use two methods to gather new web sites for their listings. First they accept self "nominations" from web page developers. You will see a link on most search engine home pages indicating where someone can go to add pages to that search site. Second, most web sites employ sophisticated software robots or "spiders" whose job is to continuously surf the web "harvesting" new web pages. These spiders use sets of rules (called algorithms) to place sites in more-or-less relevant categories. Not all of the spiders do a great job of categorizing the web sites they find (as you surely have found if you have used them).

The best search sites don't rely on computers to do all of their categorizing. They do it the old fashioned way--with librarians! That's one of the reasons index sites such as Yahoo! are so popular. Despite its popularity, Yahoo! has one of the smaller databases, but it makes up for its relative smallness with quality. It is very high quality because a real live person actually looks at the sites before they are placed in the Yahoo database.

The best way to test search sites is to try several and then bookmark your favorites for future visits. Typically these sites require a Web browsing tool such as Netscape or Internet Explorer, although some may use stand-alone applications, some of which are Java-based.

List of Popular Search Engines and Indices

Google uses sophisticated next-generation technology to produce highly relevant search results. Google returns relevant results because it responds to queries using an automated method that ranks relevant websites based on the link structure of the Internet itself.

Google is designed to impose order on information chaos with its PageRankTM technology. PageRank leverages the structural nature of the web, which is defined by the way in which any web page can link to any other web page, instantly, directly, and without an intermediary. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. Google assesses a page's importance by the votes it receives. But Google looks at more than sheer volume of votes, or links; it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important."

Google's complex, automated search methods preclude human interference. Unlike other search engines, Google is structured so no one can purchase a higher PageRank or commercially alter results. A Google search is an honest and objective way to find high-quality websites, easily. Search Google at:

http://www.google.com/

Billing itself "the world's largest search engine directory," Searchpower.com provides links to over 1,700 different specialized search engines. The indexed search engines are organized in over 60 categories, and individual engine entries indicate date added and number of hits from the Searchpower site. Some also include brief annotations. While a few notable specialized search engines are not listed, the site does seem to be updated on an almost-daily basis. A What's New listing and internal keyword search engine round out Searchpower's services.

http://www.searchpower.com/

A comprehensive index of over 250 different search tools can be found at search.com (pronounced: search-dot-com). Search.com is not a search engine but an index or list of search engines. Its address is:

http://www.search.com

Northern Light <http://www.northernlight.com/> Search over 150 million Web pages and the Special Collection™ of more than 5,400 full text sources at Northern Light. The Northern Light Special Collection™ is an online business library comprising 5,400 trusted, full-text journals, books, magazines, newswires, and reference sources.  The breadth of information available in the Special Collection is unique to Northern Light, and includes a wide range of diverse sources such as American Banker, ENR: Engineering News Record, The Lancet, PR Newswire, and ABC, NPR, and Fox News Transcripts.  This content is fully searchable integrated with the Web, or on its own through Power Search.

FAST Search <http://www.alltheweb.com/> Claiming to be the world's biggest search engine, this new Web tool is the product of a joint venture between Dell Computer and FAST, a Norwegian company that has announced its intention to catalog the entire Internet. At 80 million searchable documents, FAST Search is not actually the world's most comprehensive engine (with over 150 million documents indexed, Northern Light holds that title), but it may be if it reaches its goal of 200 million by the end of summer 1999. Return pages also include a link to the same search at Lycos FTPsearch, which indexes over 100 million FTP files.

Some of the most popular general-use search tools are the following:

infoSeek Guide, Excite, Open Text Index, Point, Hotbot, IBM Infomarket, Lycos, Yahoo!, Alta Vista, Magellan, The Electric Library, and Accufind.

All of these popular search engines can be found at search.com and are also linked to Netscape's Net Search Home Page, which is found at:

http://home.netscape.com/escapes/search/ntsrchrnd-2.html

A very nice web-based search engine that catalogs web sites as well other internet sites is called the Tradewinds Galaxy, and is located at:

http://galaxy.einet.net/galaxy.html

This address points to the Really Useful Web Sites Page. This page claims to be an "online reference for when you need to know something fast." It is not a complete listing, for example, of all the resources on the Net, just the ones the author feels are the most helpful and accessible. It truly is a really useful page!

http://www.tac.nyc.ny.us/~rudolph/useful/

How about a multi-language search tool? It is becoming increasingly important to be able to search the web in a language other than English. This is offered by AltaVista at:

http://www.altavista.com/

Search Engine Colossus, offers links to a large number of country- or region-specific search engines. The search engines are organized by country, with a link to the service, the language(s) it uses, its point of origin, and a short description when available. The number of search engines available under each heading can vary considerably, as would be expected. Search engines in eleven general categories, including Academic, Business, Medical, and Sports, are also listed. Users wishing to narrow their searches and/or utilize some lesser-known search engines will find this site a helpful starting point. This site is located at:

http://www.searchenginecolossus.com/

Need to find someone's personal web page? Ahoy! The Homepage Finder can help with that. It is located at:

http://ahoy.cs.washington.edu:6060/

Lycos RichMedia--a multimedia (pictures, movies, streams, and sounds) search engine: RichMedia from Lycos distinguishes is a multimedia search engine that distinguishes itself in two ways. First is its size (over 17 million files indexed) and speed (based on the technology that drives FAST Search). Secondly, users can click on search returns to directly access the content (pictures, movies, streams, and sounds) instead of going through the source site first. In most cases, however, links are also provided to the source. Like other multimedia search engines, RichMedia offers a filter (Search Scrub) to block adult content, although the site warns that it can never be 100 percent effective.

http://richmedia.lycos.com/

A child-oriented search engine and index called Yahooligans! (run by the same folks who run Yahoo!) is located at:

http://www.yahooligans.com/

DirectHit (http://www.directhit.com/) is a "Popularity Engine" which ranks the top ten sites based upon previous user activity related to a particular search subject. By tracking the sites that users actually select from the search results list, Direct Hit theoretically offers the most popular and relevant sites for a given request. Direct Hit is presently available in conjunction with the HotBot search engine (http://www.hotbot.com). Users can access it from the Direct Hit site or by running a search at HotBot and selecting the "Get the Top 10 Most Visited Sites for this query" link.

Cool Shortcuts

A little-known shortcut for searching is available in both Netscape Navigator and Microsoft Internet Explorer. Typing the question mark, followed by a space, followed by a search term, in the "Location" line of your web-browser initiates an automated search query. So, for example, if you typed:

? library

you would be automatically presented with a results page of web sites which satisfied this query.

Picture of Autosearch Features

The newer versions of Navigator (4.06 and greater) also have a useful "What's Related" pop-up menu located to the right of the "Location" box. Clicking on it initiates a lateral search (a search to identify other sites of a similar type).

Multiple-Search-Engines

A new category of search engine is emerging--the multiple-search-engine. These sites specialize in conducting simultaneous searches from one query. This is different than a site like search.com which is an index of links to other individual search sites. Here are a few of the most promising multiple-search-engines for you to try.

Dogpile searches most of the popular search engines including AltaVista (http://altavista.digital.com/) and Reference.com (http://www.reference.com/). It also searches FTP archives, Usenet newsgroups, and other sources about which you might not have thought.

http://www.dogpile.com/

The All-in-One Search Page is a compilation of various forms-based search tools found on the Internet. They have been combined to provide a consistent interface and convenient ALL-IN-ONE search point.

http://www.allonesearch.com/

Metasearch allows you to enter your search terms and choose advanced features like Boolean operators just once -- then search multiple engines without retyping

http://metasearch.com/

The BigHub.com This site joins an already crowded field of portal/ multiple search engine sites. Like some of its peers, BigHub allows simultaneous searches of numerous engines, in this case nine, and includes several value-added features, such as stock quotes, news, weather reports, and free email. Unlike some of its competitors, BigHub also features a nice collection of specialty search engines in categories such as Arts & Humanities, Education, Government, News, and Science, among others. Search engines in each category are further organized by discipline. Although any such collection is inevitably subjective, users may find the site a useful tool to reduce their search time and gather more relevant results.

http://www.thebighub.com/

 

Where To Go for More Information About Search Engines

Search Engine Showdown is a web site which offers comprehensive comparisons of the major search engines including lists of features of the major search sites, reviews, hints on effective search strategies, statistics on usage of major sites, and more. The site even provides links to a Usenet newsgroup and a listserv on the topic of search engine technology and performance.

http://www.notess.com/search/

Search IQ A large collection of sites on search engines, notable for two features. First, its collection of search engine reviews is rather extensive, covering far more than the usual dozen or so listed at most search engine review sites. Although rankings and full reviews are offered for only 17 engines, the individual and meta-search engine listings offer at least a sentence or two on many more. The other key section of the site is a fairly large directory of specialized search engines, organized by subject. Additional resources at the site include daily tips, tutorials and guides, and a listing of new search engines.

http://www.searchiq.com/
 

Happy searching!


Internet Lessons version 1.5. Copyright of lessons (C) 1999 by Joseph A. Erickson, All Rights Reserved. Permission Granted for Individual Usage.

If you plan to distribute multiple copies of this work, please contact the author.


*_<--Back to Internet Lessons Index