I use mostly altavista and google for my examples, but you'll find an Infoseek form,
for proximity
searches, [at the bottom] of this page. See elsewhere on my site
WHICH search engines or
bots you should use for your specific queries. The number of
retrieved pages, given in pharentesis,
will of course vary each time you re-perform each example search. Note also
how some of the examples gibem here represent
quite useful "links facories" per se. 1)USE VARIOUS RESOURCES!
Should I give only ONE advice, it would be this one. Even more important that "keeping on
track" (see below). Never, never, never overestimate your search tool of choice. EACH search
engine, [main],
[regional]
or [local]
has its own quirks and its own blindness patterns ("shadows").
Everytime I 're-play' a given
search on some specific "free-pages depository" local search engine (� la geocities) I get
amazing results... Thus you should
NEVER 'stick' to a 'given' search engine 'of choice'. Learn how much they differ and - even more
important - understand how much each "given s.e." results' set changes over time! The web is a quicksand,
and search engines databases AND POLICIES are continuously changing as well. Altavista and
ftpsearch, for instance,
are now actively 'censoring'
results. Try there a god search for MP3, DVD reversing, Napster, Gnutella or
Infraseek and you'll quickly
see what I mean.
2)KEEP ON TRACK!
Nothing easier than to loose your thread when you are working on
the web. The examples used on my site represent
links to
interesting (I hope) searches / places /startpoints as well.
As you'll soon realize, the examples and links
offer you continuous opportunities to leave this site in
order to
browse to other
very promising ones. This is done on purpose: The hyper bastard approach to web page building is -on most sites-
to restrict click away opportunities to a bare minimum. Even when a
reference demands a link, new methods hide or reduce the
visibility of that same link. Everything in order to keep a visitor 'caged' or 'trapped'
in a given site. I'll do the exact contrary, since you must
learn some discipline if you'r going to be a good seeker. You leave my site for good while
searching for a target?
Good riddance. My links will offer you a
lot of added
knowledge AND will at the same time test your
capability to keep on track :-)
3)LOWERCASE
Always enter your search terms in lower case (unless you
want to limit your search).
Most search engine will thus find both upper and lower case
occurences of your searchstring. "How to Search" (18132) is
NOT the same as "how to search" (32607)
4)EXACT SEQUENCE [""]
Enclose terms in double quotation marks if you want to retrieve
those exact terms in that exact sequence. This may be very
useful in order to find a specific page. Thus "searchengines" will give
you (22351) pages with the two terms 'glued' together. Similarly
"saerch engine" will
retrieve some (11) pages WITH THIS SAME MISSPELLING ERROR.
5)NARROW DOWN [ AND | & | + ] and ELIMINATE MERCILESSY
[ AND NOT | | | - ]
Narrow your searches by linking your search terms with AND or &,
or simply use
the plus sign [+].
The search engine will find only those pages that contain all of
your search terms.
Similarly, exclude pages that are not relevant to your search by
preceding the search term with AND NOT or | or simply use the
minus sign [-].
+"search engines" +hints +tips
+techniques -tits -sex -"make money" (933) is better than
the more simple +"search engines" +hints +tips +techniques (1233)
6)DOWNSIDE OF THE + & - SIGN With the
+ sign you may miss related documents that don't have the
words you specify as required. For example, the search
"searching tips" +searchlores would not
include documents
that have the words
"searching tips", but not searchlores.
With the - sign it's easy to exclude too much. For example, if you were
looking for information on "bots script" but not in
javascript, the search +"bots scripts"
-javascript would exclude
a document that was all about bots scripts, but that had
the sentence "this kind of bot would be impossible in javascript"
7)DOWNSIDE OF THE BOOLEAN
operators
It's often difficult to specify exactly what you want to include
or exclude.
You can also get unexpected results if you are not careful about
your use
of operators and parentheses. For example, the search seeking OR
searching
AND finding is the same as the search seeking OR (searching AND
finding).
Both queries will find documents that contain both searching and
finding,
together with documents that contain the word seeking. However,
the query
(seeking OR searching) AND finding is not the same. It will find
documents
containing the word finding and, in the same document, either
seeking or
searching.
Be careful with the
boolean operators!
8)"PECULIAR" strings
You should always strive to use differentiating keywords when
searching
the web. Words that are commonly used will not help you much.
Extremely common words like articles and prepositions are so
worthless that they are
completely ignored. Try to use words which underline
the peculiarity of your target. Common words, when combined with
boolean qualifiers, can be very effective. You must identify the
main concepts in your topic and determine any synonyms,
alternate spellings, or variant word forms for the concepts.
Remember that the most "peculiar" a word, the more useful
it will be in order to sharpen your search. +
title:"search strateg*" +hints +tips
in this case we did include the "search strateg*" string (which
already has
an elevate PEC) in the title: keyword.
9)SPECIAL KEYWORDS Note the use of a keyword in the
previous example.
Here a short list of the main keywords
(for altavista):
anchor:text
applet:class
domain:domainname to avoid
commercial crap exclude with -"domain:com"
host:name
image:filename
link:URLtext
text:text
title:text
very useful for narrowing
url:text
10)ASTERISK[*]
Note also the use of the asterisk [*] in
the previous example: it MUST be used after at least 3 characters,
it is valid
for up to 5 characters or as an element of a phrase. For
Altavista:
Asterisk (*): After 3 specified
characters will search
for matches in up to 5 trailing letters.
Question Mark (?): After 3
specified characters will match
exactly one more character.
Double Asterisk (**)
More flexible as it will search for
matches for an unlimited number of
trailing characters.
You
also have the ability use the wildcards interchangeably and more
than once in the same search
string
11)ARCHIVE
You should archive your useful queries and repeat them over time.
All
search engines that contain the "cgi-bin" snippet in the query
produced can be saved and used
again later. Since the results of all queryes VARY
WITH THE TIME (when traffic
is particolarly heavy the search engines "cut" the results) you
would be well
advised, for important queries, to repeat them again and again.
12)STOP WORDS
Stop words are words such as "and" "the" and "or" which search
engines exclude from their searches to make them more effective.
These terms are excluded because they are either extremely common or they
are used by the search engine for performing more specialized
searches. Just think about how many documents on the Web
contain the word "the" and you'll understand how important is a good stop words list for
all search engines.
If you really do want to search for one of these terms, there is an
easy way to work around stop words. By bracketing words in
quotation marks, search engines will look for every word inside
the quotes, in the sequence you specify. Thus, if you wanted to
look for sites with the words search the web
you would use the searchstring "search the web".
13)SNOOPING BOLDLY AROUND -1
As you'll learn elsewhere on this site, there are many methods to access
some 'non public' portions of the web.
A quick tip is to look for a file called ROBOTS.TXT in the main directory of your
target site, entering per hand the URL with the following pattern: http://www.targetsite.com/robots.txt
This file is used to tell search engines which directories and files they should
not index on a specific site. Thus anything that has been put inside a 'robots.txt' file
will not be found by your searchqueries. However, once you have seen the names, you can
still type them directly into your browser in order to access the various subdirectories
and pages.
14)SNOOPING BOLDLY AROUND -2
Another good idea may be to index (after having "registered" it)
a site you are interested in with a search bot that does not respect too much
the robots exclusion parameters. For instance atomz...
you can try registering [there]
a target site you are interested in, or else try
your luck onto my own site using the form below :-)
Now check the difference comparing the results you got with atomz with those you'll
get using my own namazu
searchengine:
15)DOWNLOADING FILES FROM BUSY SERVERS
If you are trying to download some (ahem) popular files, you are probably competing
with many other people for access. Pick a server in a country where it is
very early in the morning if you have this option, alternatively schedule
the download so that it will be effectuated
when the time IN THE STATES or in EUROPE is early in the morning (GMT 05.00 or GMT 12.00)
or, MUCH MUCH better, use an automatic email downloader like downloadslave
instead (see the
accmail section) and spare you the hassle :-)
16)PROXIMITY SEARCHES... HIT PAYLOAD EVERYTIME YOU SEARCH!
Real ~S~eekers use proximity operators quite a lot (for obvious - ahem - reasons) as you'll
learn in the advanced sections of my site.
Altavista uses the NEAR command in order to select keywords
within 10 words of each other, useful but quite limited.
When you seriously work using
proximity
searches THERE IS ONLY ONE SEARCH ENGINE FOR YOU: Infoseek which will allow
you to choose any of the following options... or to combine them... :-)