Searching the
Internet
This page will discuss various
strategies and tools for searching the Internet. This is a crowded
field, with exotic names such as Bing, Dogpile,
Google, Yahoo!, to name a few. Hopefully this lesson will give
you a flavor of the myriad of tools (usually called "search engines")
and strategies available for searching the Internet. While most folks
believe they're fairly proficient at web searching, in fact most folks
don't know the most basic elements of how searching works or how
to be more a more savvy searcher. So let's make sure you're not one of
those folks! We will start
with the basics of searching.
Searching the World Wide Web and More: Some General Tips for
Searching
First, search engines typically use
Boolean logical operators to do their searching. Basically, this
means that you can use specific words to set up a fairly
sophisticated search that can be precisely defined. Search engines
recognize the operators "and," "not," "or," and parentheses. I'll
explain each of these briefly below:
And - this allows you
to search for items or directories that include two or more terms
of interest. Both terms will have to be in the document or
directory title to be included in your search results. An example
of an "and" search is the following:
Words to search for: utah and
weather
(Note: Most search engines will
assume an "and" if you do not place an operator between the two
words - "Utah weather" will return the same search
results).
Not - this operator allows
you to exclude a term from the items you are searching for. You
might wish to locate documents on foreign language programs except
for ones involving Spanish. This search would look like the
following:
Words to search for: language
programs not Spanish
Or - the "or" operator
should only be used rarely because it will return items that
include _either_ of the terms you enter. For instance, the
following search would give you a huge number of
results:
Words to search for: ibm or
mac
Parentheses - Parentheses
allow you to do rather complex searches by breaking your search
into separate elements. For instance, say you want to find items
on travel in either Spain or Portugal. Your search would look like
this:
Words to search for: travel and
(Spain or Portugal)
There are several combinations of
this type that can be made. My advice is to avoid making your
search too complex. Start with a fairly complex search, and then
break it down until you get the results you need.
Second, search engines allow for word
truncation. This means that you can enter a word root as a
search term and then add an asterisk at the end (sometimes the
asterisk is not required). The search tool will search for any word
that begins with that root. For example, the search term "librar*"
would return items with the following words:
library
libraries
librarian
librarians
Truncation can be quite useful,
particularly with the issue of plurals and singulars. Say you were
interested in items on "librarians" or "librarian." You could enter
"librarian*" your search term and get items with either term in
them.
Third, typically you can control how
many results or "hits" a search engine will display per page. A standard search
will default to no more than 50 or 100 items per page. You can usually
modify this parameter via a pop-up menu or dialog box at the
site.
How they Work
In the past few years, many very
sophisticated web-based search tools have been developed which can
search web pages, Usenet news postings, electronic phonebooks and
more. These search engines are essentially databases in which
computer users may search by asking questions, called "queries." A
partial list of some popular search engines and indices can be found
below. As useful and powerful as many of these search sites are, they
are not as comprehensive or as useful as they might
appear.
A recent Associated Press report
cited a study of 11 of the largest search sites. The study found that
even the most powerful sites only cover about one-sixth of the web
pages available on the Internet at any particular time and that the
time it takes them to list new sites (i.e., find and index) is
growing. The study found that on average it takes a new web page up
to six months or more to make it into a search engine's listings. Also, the
various search engines vary widely in their quality, speed, and ease
of use.
There are other problems with our current web search approaches. A recent article in the New York Times reviewed the so-called "deep web" problem. Search engines rely on programs known as crawlers (or spiders) that
gather information by following the trails of hyperlinks that tie the
Web together (see more on this below). While that approach works well for the pages that make up
the surface Web, these programs have a harder time penetrating
databases that are set up to respond to typed queries.
In general, search engines use two
methods to gather new web sites for their listings. First they accept
self "nominations" from web page developers. You will see a link on
most search engine home pages indicating where someone can go to add
pages to that search site. Second, most web sites employ
sophisticated software robots or "spiders" whose job is to
continuously surf the web "harvesting" new web pages. These spiders
use sets of rules (called algorithms) to place sites in more-or-less
relevant categories. Not all of the spiders do a great job of
categorizing the web sites they find (as you surely have found if you
have used them).
The best search sites don't rely on
computers to do all of their categorizing. They do it the old
fashioned way--with librarians! That's one of the reasons index sites
such as Yahoo! may be more helpful than a traditional search engine.
Despite its popularity, Yahoo's index (as contrasted with its search
engine tool) has one
of the smaller databases, but it makes up for its relative smallness
with quality. It is very high quality because a real live
person actually looks at the sites before they are placed in the
Yahoo index.
The best way to test search sites is
to try several and then bookmark your favorites for future visits.
Typically these sites require a Web browser, although some may use stand-alone applications,
some of which are Java-based.
Start With The Right Site
When you're looking for something
specific, like movie reviews, zip codes, legislation, etc., the key to
finding useful data on the the Internet may be starting with the right
search resource. Many sites are specific to one or just a few topics or
databases, making them a much better resource for that specific domain
than the big generic search engines. Here's a quick review of some of the sites best suited for finding specific kinds of information.
List of Popular Search Engines and
Indices
Google Google is by far
the most popular search engine on the Internet, but why? It's not
because of the funny name and colorful logo. The folks
behind Google invented a
sophisticated new approach to page ranking to
produce highly relevant search results. Google returns relevant results
because it responds to queries using an automated method that ranks
relevant websites based on the link structure of the Internet itself.
Google was founded by college students Sergei Brin and Larry Page as a
result of research in which they engaged to solve the problems
associated with unreliability of searches on the Internet.
Google is designed to impose order on
information chaos with its page ranking technology. Their approach
focuses on the structural nature of the web, which is defined by the
way in which any web page can link to any other web page, instantly,
directly, and without an intermediary. In essence, Google interprets a
link from page A to page B as a vote, by page A, for page B. Google
assesses a page's importance by the votes it receives. But Google looks
at more than just the number of votes, or links; it also analyzes the
page that casts the vote. Votes cast by pages that are themselves
"important" weigh more heavily and help to make other pages "important."
Google's complex, automated search
methods attempts to preclude human interference, but as we'll discuss
in class, that fight may be a losing battle. Search Google at:
http://www.google.com/
Yahoo! Yahoo! is a a web
portal (not just a search engine) that provides services via the
Internet worldwide. The company is perhaps best known for its web
portal, search engine (Yahoo! Search), Yahoo! Directory, Yahoo! Mail,
Yahoo! News, advertising, online mapping (Yahoo! Maps), and social
media websites and services. Yahoo! was founded by college students
Jerry Yang and David Filo in January 1994 and was incorporated on March
1, 1995. It is one of the oldest and most popular websites in
continuous use. Originally called "David and Jerry's Guide to the World
Wide Web," David and Jerry's Guide to the World Wide Web was a
directory of other web sites, organized in a hierarchy, as opposed to a
searchable database of web pages. In April 1994, "David and Jerry's
Guide to the World Wide Web" was renamed "Yahoo!" A search engine tool
was added later. Search Yahoo! at:
http://www.yahoo.com/
Bing (formerly Live Search,
Windows Live Search, and MSN Search) is the current web search engine
(advertised as a "decision engine") from Microsoft. Bing's unique characteristics include the listing of search suggestions
as queries are entered (since copied by several other search engines) and a list of related searches (called "Explorer
pane") based on semantic technology that Microsoft acquired. As of January 2010, Bing was the third largest search
engine on the web by query volume, at 3.16%, after its competitor
Google at 85.35% and Yahoo at 6.15%. Microsoft later acquired a controlling interest in Yahoo!
http://www.bing.com/
A9 is a
competitor of Google and Bing, owned and operated by Amazon.com. It claims to provide better results
because it goes beyond traditional page ranking methods to
determine authority, in addition to relevancy. To determine the
authority or quality of a site's content, A9 uses a process
they call "Subject-Specific Popularity"--a procedure similar to
Google's method that ranks a site based on the number of
same-subject pages that reference it, not just general popularity,
to determine a site's level of authority. Find A9 at:
http://www.A9.com/
Other Contenders in the
Search Engine War:
AltaVista: How
about a multi-language search tool? It is becoming increasingly
important to be able to search the web in a language other than
English. While other search sites now offer this feature, the original
provider of this service was AltaVista.Search
AltaVista at:
http://www.altavista.com/
Search Engine Colossus,
offers links to a large number of country- or region-specific
search engines. The search engines are organized by country, with
a link to the service, the language(s) it uses, its point of
origin, and a short description when available. The number of
search engines available under each heading can vary considerably,
as would be expected. Search engines in eleven general categories,
including Academic, Business, Medical, and Sports, are also
listed. Users wishing to narrow their searches and/or utilize some
lesser-known search engines will find this site a helpful starting
point. This site is located at:
http://www.searchenginecolossus.com/
Blekko: New search engines are
released all the time, and occasionally a new one is worth checking
out. Blekko is a new customizable search engine that helps users stay
away from spam, content farms, and malware. Blekko draws on the power
of the slashtag
in order to organize websites and search results around specific
topics. The slashtag is a tool designed to filter search results, and
it helps visitors only search high quality sites that are vetted by a
team of experts.
http://blekko.com/
Do
More With Google
Google has some hidden features that
may be extremely useful. Here's a table describing some of these
features. Go to http://www.google.com/help/features.html
for a complete list.
|
Feature:
|
What to type:
|
Result your get:
|
|
Dictionary
|
define:word
|
Links to definitions
|
|
Calculator
|
10*35+4 (or any other equation)
|
The answer
|
|
Phone Book
|
first name, last name, zip code, or last name, zip
code
|
Phone book matches
|
|
Special codes
|
package tracking numbers, area codes, vehicle ID
numbers
|
Relevant results
|
|
Stock Quotes
|
sticks:ticker
symbol
|
Recent stock quotes
|
|
Maps
|
street address, city, state, or zip code
|
Links to maps
|
|
Who Links to...
|
link:site URL
|
Websites that link to that URL
|
|
Search only one website
|
search term
site: site URL, e.g.,
graduation site:www.augsburg.edu
|
Search results limited to that site
|
A
website has been developed that automates access to many of these
special Google features. it's called Soople, and is available at:
<http://www.soople.com/>.
Multiple-Search-Engines
A new category of search engine is
emerging--the multiple-search-engine. These sites specialize in
conducting simultaneous searches from one query. This is different
than a site like Google because it searches multiple search databases at the same time. Here are a few of the most promising
multiple-search-engines for you to try.
Dogpile searches most of the
popular search engines.
It also searches FTP archives and other sources
about which you might not have thought.
http://www.dogpile.com/
Find How Find How's goal is to assist users by simplifying
and speeding access to trusted, reliable "How-To" content on the Internet.
http://www.findhow.com/
Metasearch allows you to enter
your search terms and choose advanced features like Boolean operators
just once -- then search multiple engines without retyping. The
additional searches are listed to the right of the default search
results window.
http://metasearch.com/
Where To Go for More Information
About Search Engines
Search Engine Showdown is a
web site which offers comprehensive comparisons of the major search
engines including lists of features of the major search sites,
reviews, hints on effective search strategies, statistics on usage of
major sites, and more. The site even provides links to a Usenet
newsgroup and a listserv on the topic of search engine technology and
performance.
http://www.searchengineshowdown.com/
Happy searching!
