As you probably already know (else what the
hell are you doing here?) the various advanced techniques you may
use in order to search the web amount to a difficult and
ill-understood art.
Try for instance the following:
as you will see the differences between the two queries seem
inexplicable.
Can you tell me why ELIMINATING from our
query the word "money" we actually (March 2000) get
MORE hits for the same string "how to
search"?
[+"how to search the www"
-money] 120 hits
[+"how to search the www"] 118 hits
(Note that variations are possible even during a single day)
These - and other - quirks are
due to the specific algorithms that the search engines use. Thus searching is still
far from being a completely understood science. There is an 'art' aspect (a 'lore'
aspect IMO) that plays a role, as you'll see more often than not.
The imperative of preparing a good advanced query notwithstanding, all
searchers like to try a few "quick searches" to test a search engine or a
query idea. Thus oddities are found.
Typing in a few terms into a blank box and seeing what comes up can be great fun, since
every now and then, sifting through a pile of less relevant material, you may even
find some truly
interesting results. More often, something appears that makes you wonder
where it came from. These 'odd' results are at times worth investigating per se,
since they can help you to reverse engineer the algos used by the main search engines.
Note
that this kind of reverse engineering is actively performed by thousands and
thousands of little commercial bastards, whose only aim is to spam each and
every search engine with their pathetic little sites for profit purposes.
Yet even this kind of vermin's activity can be useful for us: some of the
tricks devised by commercial hooligans in order to spam
the search engines can open for us, as you will see, whole horizons of new
and useful techniques that we will use (and spread) in order to ELIMINATE
those very spamming sites when
searching for knowledge.
In fact we can - and will - use those same tricks REVERSED, in
order to cut our queries deep through the spam sites and
catch the little (and more and more rare)
gems we are looking for. Hope you understand what I mean... I'll make an example:
since the very moment you find in a
page images with single pixel width/height -aka webbugs- that are pointing to the
main index page
of a given site (an old Architext trick) you know that you have to do with evil spammers, you just need to
filter such crap out
from your result lists with a simple specific filter... Perilli praemium adipiscunt! Eheh :-)
The oddities you'll encounter are due to the fact that search engines have some
defaults and basic features that are
different, and thus their specific working is
not always intuitive.
Often these different settings
are the culprits that cause those unusual, funny or "false" results.
For example, the default for many Web engines is to OR terms together, then
provide results based on relevancy. This combination produces a retrieval that
has all terms present in the first few hits, and then fewer terms as you move
through your hit list. This explains why, even though your terms were ORed
together, the last hits
do not even contain all of your search terms. Unless you specifically ask to AND
terms together, do not trust your search retrieval number to accurately portray
the number of hits from your search strategy.
Another typical default is automatic truncation on each term. So if your search
is for "web search" you will also retrieve documents with the terms
"searching," "searcher," and even "web-spiders" in them.
Another way of explaining "false" results is by determining exactly what the search
engine is searching. Usually, the default is the URL, but sometimes a search
engine retrieves documents where your search terms appear anywhere in the
document. An address might include your search term, but the actual document may
not show your term when retrieved. Also never forget the quicksand nature of the web:
you may retrieve a page that has
had your term some time ago,
but that has been updated in the mean time, whereby your term disappeared.
Pay close attention to any Web engine documentation to
clarify just how and what it searches. All "searching mysteries" can be
solved if you have enough time and will to do so.