Augsburg Logo


Main | Course Syllabus | Internet Lessons | Supplementary Readings | Other Course Documents
FAQ | WWW Starting Points | HTML Resources | ISTE Standards

Searching the Internet

This lesson will discuss various strategies and tools for searching the Internet. This is a crowded field, with exotic names such as AltaVista, Lycos, Dogpile, infoSeek, Google, and Excite, to name a few. Hopefully this lesson will give you a flavor of the myriad of tools (usually called "search engines") and strategies available for searching the Internet. We will start with basics of searching.

Searching the World Wide Web, Usenet, and More: Some General Tips for Searching

First, search engines typically use Boolean logical operators to do their searching. Basically, this means that you can use specific words to set up a fairly sophisticated search that can be precisely defined. Search engines recognize the operators "and," "not," "or," and parentheses. I'll explain each of these briefly below:

And - this allows you to search for items or directories that include two or more terms of interest. Both terms will have to be in the document or directory title to be included in your search results. An example of an "and" search is the following:

Words to search for: utah and weather

(Note: Most search engines will assume an "and" if you do not place an operator between the two words - "utah weather" will return the same search results).

Not - this operator allows you to exclude a term from the items you are searching for. You might wish to locate documents on foreign language programs except for ones involving Spanish. This search would look like the following:

Words to search for: language programs not spanish

Or - the "or" operator should only be used rarely because it will return items that include _either_ of the terms you enter. For instance, the following search would give you a huge number of results:

Words to search for: ibm or mac

Parentheses - Parentheses allow you to do rather complex searches by breaking your search into separate elements. For instance, say you want to find items on travel in either Spain or Portugal. Your search would look like this:

Words to search for: travel and (spain or portugal)

There are several combinations of this type that can be made. My advice is to avoid making your search too complex. Start with a fairly complex search, and then break it down until you get the results you need.

Second, search engines allow for word truncation. This means that you can enter a word root as a search term and then add an asterisk at the end (sometimes the asterisk is not required). The search tool will search for any word that begins with that root. For example, the search term "librar*" would return items with the following words:

library

libraries

librarian

librarians

Truncation can be quite useful, particularly with the issue of plurals and singulars. Say you were interested in items on "librarians" or "librarian." You could enter "librarian*" your search term and get items with either term in them.

Third, typically you can control how many results or "hits" a search engine will return. A standard search will be default to no more than 50 or 100 items. You can usually modify this parameter via a pop-up menu or dialog box at the site.

How they Work

In the past few years, many very sophisticated web-based search tools have been developed which can search web pages, Usenet news postings, electronic phonebooks and more. These search engines are essentially databases in which computer users may search by asking questions, called "queries." A partial list of some popular search engines and indices can be found below. As useful and powerful as many of these search sites are, they are not as comprehensive or as useful as they might appear.

A recent Associated Press report cited a study of 11 of the largest search sites. The study found that even the most powerful sites only cover about one-sixth of the web pages available on the Internet at any particular time and that the time it takes them to list new sites (i.e., find and index) is growing. The study found that on average it takes a new web page up to six months to make it into a search engine's listings. Also, the various search engines vary widely in their quality, speed, and ease of use.

There are other problems with our current web search approaches. A recent article in the New York Times reviewed the so-called "deep web" problem. Search engines rely on programs known as crawlers (or spiders) that gather information by following the trails of hyperlinks that tie the Web together (see more on this below). While that approach works well for the pages that make up the surface Web, these programs have a harder time penetrating databases that are set up to respond to typed queries. Click here to read more about the strategies researchers are developing to mine the "deep web."

In general, search engines use two methods to gather new web sites for their listings. First they accept self "nominations" from web page developers. You will see a link on most search engine home pages indicating where someone can go to add pages to that search site. Second, most web sites employ sophisticated software robots or "spiders" whose job is to continuously surf the web "harvesting" new web pages. These spiders use sets of rules (called algorithms) to place sites in more-or-less relevant categories. Not all of the spiders do a great job of categorizing the web sites they find (as you surely have found if you have used them).

The best search sites don't rely on computers to do all of their categorizing. They do it the old fashioned way--with librarians! That's one of the reasons index sites such as Yahoo! are so popular. Despite its popularity, Yahoo! has one of the smaller databases, but it makes up for its relative smallness with quality. It is very high quality because a real live person actually looks at the sites before they are placed in the Yahoo database.

The best way to test search sites is to try several and then bookmark your favorites for future visits. Typically these sites require a Web browsing tool such as Netscape or Internet Explorer, although some may use stand-alone applications, some of which are Java-based.

Start With The Right Site

When you're looking for something specific, like movie reviews, zip codes, legislation, etc., the key to finding useful data on the the Internet may be starting with the right search resource. Many sites are specific to one or just a few topics or databases, making them a much better resource for that specific domain than the big generic search engines. The search site search.com is also a good tool for more information about specific search sites--click on "more..." to view links to these other information sources. Here's a quick review of some of the sies best suited for finding specific kinds of information.

To find information about...
...check here.
Government
www.firstgov.gov
Health
www.medlineplus.gov
Law
Legal Information from LexisNexis
Movies
www.imdb.com
People
www.accurint.com  (this site is not free)
News
www.topix.net
Reference
www.refdesk.com
Words
www.onelook.com

List of Popular Search Engines and Indices

Google uses sophisticated next-generation technology to produce highly relevant search results. Google returns relevant results because it responds to queries using an automated method that ranks relevant websites based on the link structure of the Internet itself.

Google is designed to impose order on information chaos with its PageRankTM technology. PageRank leverages the structural nature of the web, which is defined by the way in which any web page can link to any other web page, instantly, directly, and without an intermediary. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. Google assesses a page's importance by the votes it receives. But Google looks at more than sheer volume of votes, or links; it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important."

Google's complex, automated search methods preclude human interference. Unlike other search engines, Google is structured so no one can purchase a higher page rank or commercially alter results. A Google search is an honest and objective way to find high-quality websites, easily. Search Google at:

http://www.google.com/

A9 is a competitor of Google's, owned and operated by Amazon.com. It claims to provide better results because it goes beyond traditional page ranking methods to determine authority, in addition to relevancy. To determine the authority or quality of a site's content, A9 uses a process they call "Subject-Specific Popularity"--a procedure similar to Google's method that ranks a site based on the number of same-subject pages that reference it, not just general popularity, to determine a site's level of authority. Find A9 at:

http://www.A9.com/

A comprehensive index of over 250 different search tools can be found at search.com (pronounced: search-dot-com). Search.com is not only a search engine, but also an index or list of search engines. Its address is:

http://www.search.com

ALLTheWeb: Claiming to be the world's biggest search engine, this new Web tool is the product of a joint venture between Dell Computer and FAST, a Norwegian company that has announced its intention to catalog the entire Internet. At 80 million searchable documents, AllTheWeb is not actually the world's most comprehensive engine (with over 150 million documents indexed, Northern Light holds that title). When users perform a search using any of the five major search options (Web pages, Pictures, Videos, MP3 files, or FTP files), a sidebar shows helpful results in other categories. Users can also limit their searches in a variety of ways directly from the search box, which offers a handy pull-down menu to limit by language. The help page explains the various search options, including searching for words in the title or domain name, searching pages on a specific site, searching for pages with a link to a specific site, and more. AllTheWeb is at: 

http://www.alltheweb.com/

Other Contenders in the Great Search Engine War:

How about a multi-language search tool? It is becoming increasingly important to be able to search the web in a language other than English. This is offered by AltaVista at:

http://www.altavista.com/

Search Engine Colossus, offers links to a large number of country- or region-specific search engines. The search engines are organized by country, with a link to the service, the language(s) it uses, its point of origin, and a short description when available. The number of search engines available under each heading can vary considerably, as would be expected. Search engines in eleven general categories, including Academic, Business, Medical, and Sports, are also listed. Users wishing to narrow their searches and/or utilize some lesser-known search engines will find this site a helpful starting point. This site is located at:

http://www.searchenginecolossus.com/

A child-oriented search engine and index called Yahoo Kids (run by the same folks who run Yahoo!) is located at:

http://kids.yahoo.com/

Do More With Google

Google has some hidden features that may be extremely useful. Here's a table describing some of these features. Go to http://www.google.com/help/features.html for a complete list.

Feature:

What to type:

Result your get:

Dictionary

define:word

Links to definitions

Calculator

10*35+4 (or any other equation)

The answer

Phone Book

first name, last name, zip code, or last name, zip code

Phone book matches

Special codes

package tracking numbers, area codes, vehicle ID numbers

Relevant results

Stock Quotes

sticks:ticker symbol

Recent stock quote from Yahoo

Maps

street address, city, state, or zip code

Links to maps

Who Links to...

link:site URL

Websites that link to that URL

Search only one website

search term site: site URL, e.g.,
graduation
site:www.augsburg.edu

Search results limited to that site


A website has been developed that automates access to many of these special Google features. it's called Soople, and is available at: <http://www.soople.com/>.

Multiple-Search-Engines

A new category of search engine is emerging--the multiple-search-engine. These sites specialize in conducting simultaneous searches from one query. This is different than a site like search.com which is an index of links to other individual search sites. Here are a few of the most promising multiple-search-engines for you to try.

Dogpile searches most of the popular search engines. It also searches FTP archives, Usenet newsgroups, and other sources about which you might not have thought.

http://www.dogpile.com/

The All-in-One Search Page is a compilation of various forms-based search tools found on the Internet. They have been combined to provide a consistent interface and convenient ALL-IN-ONE search point.

http://www.allonesearch.com/

Metasearch allows you to enter your search terms and choose advanced features like Boolean operators just once -- then search multiple engines without retyping

http://metasearch.com/

Where To Go for More Information About Search Engines

Search Engine Showdown is a web site which offers comprehensive comparisons of the major search engines including lists of features of the major search sites, reviews, hints on effective search strategies, statistics on usage of major sites, and more. The site even provides links to a Usenet newsgroup and a listserv on the topic of search engine technology and performance.

http://www.notess.com/search/

Happy searching!


Internet Lessons version 1.8. Copyright of lessons (C) 2007 by Joseph A. Erickson, All Rights Reserved. Permission Granted for Individual Usage.

If you plan to distribute multiple copies of this work, please contact the author.

_Click here to connect to the assignment for this lesson


Main | Course Syllabus | Internet Lessons | Supplementary Readings | Other Course Documents
FAQ | WWW Starting Points | HTML Resources | ISTE Standards