HOW TO SEARCH THE WEB by fravia+
(Based on some original private emailings from +ORC)
Letter 006 - December 1996
This is an exerpt from a newletter I wrote to explain things to total lamers
where I work (I still believed in this in 1996 :-( please excuse the patronising tone
The Web is immense, and still growing exponentially.
Million of pages are added/deleted/modified every day, some of them carrying information you may need or/and you did not even suspect could exist.
Pages are the most evident information vector of the World Wide Web, but you should not forget that every hour, inside one of the thousand "Usenet" groups, millions are discussing arguments that may greatly interest you.
No search engine can cope with this information quicksand. In order to search well, on the Web, you'll need a good mix of luck, nose and knowledge. Searching effectively through mountains of junk is an art -per se-, which will be more and more important in !
the future. This newsletter will -I hope- at least point you in the right direction, and will explain you how to search, per e-mail, on the Web.
You may search information on line (with a Web-browser like Microsoft's Explorer or -better- Netscape's Navigator) but you may also search per e-mail, i.e. sending a message with your query in order to automatically get, as answer, the information you are!
looking for.
I believe that to learn how to search the web for information (and how to "seed" the nuggets of information) is of paramount importance. The following can be pretty helpful even if you routinely use a browser, and it's crucial if you do not have a browser !
access at the Web: only an email address (say at work :-) The "spirit" of the whole Web is free information for everybody:.
We'll see here three simple "how to" examples:
1) FETCH A HTTP PAGE PER EMAIL
How to get -emailed- an "http://" address you already know.
2) PERFORM AN EMAIL-QUERY ON THE WEB
How to find -using good search engines- which pages (may) have the information you are looking for
3) PERFORM A QUERY PER EMAIL INSIDE ALL USENET GROUPS
How to find everything that has been said on "your" themes inside some usenet group you do not even know the name of.
1) GETTING AN "http://page" EMAILED TO YOU
(Agora fetching)
All pages on the Web have an "http://" address. This use is growing and no advertising to-day is seen without it. Recent books even offer the readers an "http://" address in order to update continuously the information offered inside.
You use an AGORA (a "fetch engine") to get the content of a page you already know the address of (you may also use Agora in order to perform queries, fetching "search engine" results... see below).
All AGORAs servers allow three main commands:
1) SEND
To: [email protected]
Subject: (nothing here)
send http://www.gateway2000.com/support/techsupt/fb/3000/3047.htm
send http://www.boutell.com/faq/
This will send you the "raw" content of two pages (about Windows 95 and about the WWW).
2) DEEP
To: [email protected]
Subject: (nothing here)
deep http://www.gateway2000.com/support/techsupt/fb/3000/3050.htm
This will send you the "raw" content of a page AND the raw content of all the pages "linked" to this page. Watch it! If there are many links you will get quite a lot of emailings! Do not use the deep command to start with.
3) SOURCE
To: [email protected]
Subject: (nothing here)
source http://www.gateway2000.com/support/techsupt/fb/3000/3047.htm
source http://www.dna.affrc.go.jp/htdocs/Agora/Help.txt
This will send you the HTML code of these two pages, and you'll therefore be able to see them exactly "as they should be" using a copy of Netscape that you have previously installed on your harddisk. You can get the most recents versions of Navigator and/or!
of Explorer on any CD-ROM magazine cover nowadays.
In my examples here I have been using a Japanese Agora, but there are other Agora servers that you can use (not many though). The difference between them is mainly in terms of relative velocity and of the "geographic area" where the results for your querie!
s are going to be pulled.
2) PERFORM AN EMAIL-QUERY ON THE WEB
(Agora querying: Webcrawler, AltaVista & Lykos)
First some important general rules:
Lower-case search will find matches of capitalised words also. For example, bonn will find matches for bonn, Bonn, and BONN. Do not use capital letters in a search, unless you know exactly what you are doing: capital letters will force an exact case match o!
n the entire word, i.e. boNn will will search only for matches of boNn. Capital letters are considered distinct from lower-case letters. When a word is found in a Web page or a news article, its case is preserved when it is stored in the index. When you ent!
er a word in a query, therefore, it is always safe, and generally recommended, to type it all in lower-case, because lower-case letters indicate a case-insensitive match. If you type any capital letters, you force an exact case match on the entire word.
To find the documents most relevant to what you need, construct your query as precisely as you can. The "AltaVista" search engine, for instance, ranks the documents found so the ones matching the most words and phrases in the query are listed first. Even so!
, you might not find exactly what you want at the head of the list if your search is too general.
WEBCRAWLER
Here a first query example using WEBCRAWLER, one of the deepest (and best) search engines on the Net. WebCrawler understands plain English and is programmed with novice users in mind, so you don't need to be a master of Boolean search syntax to unleash its !
power. (Masters of Boolean syntax can skip to our next Newsletter on "Advanced Searching").
Let's say we are interested in "linguistic phenomena". These should be the contents of our email query:
To: [email protected]
Subject: (nothing here)
Text: send http://webcrawler.com/cgi-bin/WebQuery?linguistic+phenomena
Webcrawler will then send you a list of the first 25 documents matching your query, you'll be able to retrieve the ones that do interest you using the ubiquitous Agora system described above.
Webcrawler parameters
&maxHits=25 is the default if omitted. You can choose to view search results 10, 25 or 100 at a time.
if you omit the &summaries=yes variable (short format) you will get the detailed format, which provides titles plus summaries, URLs, numerical relevancy scores, and the option of viewing similar pages for each result returned
ALTAVISTA
The same search example, this time using AltaVista, the best and quickest search engine on the Net: These should be the contents of your email message:
To: [email protected]
Subject: (nothing here)
Text: send http://www.altavista.digital.com/cgi-
bin/query?pg=q&what=web&fmt=.&q=linguistic+phenomena
AltaVista parameters:
&what=web in order to search web pages
&what=news in order to search inside newsgroups
&fmt=. Standard search (title, URL, first two lines, date and size)
&fmt=c Compact search (title, date and first 30 characters of each document)
&fmt=d Detailed search
&q=image%3Agarag%2A.jpg Search images of garages (yes, you may also search for images), or, at least, jpg images with the name pattern "garag*.jpg" (i.e. which have names beginning with "garag", where the %2A tag represents an asterisk *).
In order to get the next 30 matches add a "start query" value: &stq=30 (start at 30), for instance:
http://www.altavista.digital.com/cgi-bin/query?pg=q&what=web&stq=30&fmt=.&q=linguistic+phenomena
LYKOS
The same search examples, this time using LYKOS, one of the oldest (and best) search engines on the Net: These should be the contents of your email message:
To: [email protected]
Subject: (nothing here)
Text: send http://lycos11.lycos.cs.cmu.edu/cgi-bin/flpursuit?first=1\\&maxhits=30\\
&minterms=1\\&minscore=0.01\\&terse=standard\\&query=linguistic+phenomena
When the Lycos search engine compares each page to your query, it gives higher
scores to pages that contain the words as you typed them in. It also looks for pages that mention these words early on, rather than far down in some sub-section of the site.
The wording of all these queries (which MUST be an unique line) may seem pretty complicated at a first glance, but you'll only need to paste such a command once, in order to perform your first email query. You'll later use the "sent items" folder (inside yo!
ur Exchange program) to fetch templates in order to perform all your other queries, i.e. you'll need to change only the subjects.
Usually, you'll "shoot" all your queries in the morning and later "collect" your results. This "ping pong" approach may be slow (compared with the full graphic internet access that the browsers offer) but has some advantages: it leaves for instance a "bread!
crumbs" trail that can be useful later. It's indeed a "bookworm" approach to the Web, instead of a "butterfly" one.
Searches can (and should) be MUCH more complicated than in the examples above, as we'll see in the next newsletters. Here for instance a query example "narrowed in":
send http://webcrawler.com/cgi-bin/WebQuery?text=linguistic+phenomena+NOT+%28prefix+OR+%22vowel+weakening %22%29
This translates the query: linguistic+phenomena NOT (prefix OR "vowel weakening")
3) PERFORM A QUERY PER EMAIL INSIDE ALL USENET GROUPS
(Usenet querying: Reference.com)
We saw, with AltaVista &what=news parameter, a possibility to query newsgroup (Usenet), here is another technique, which may fish quite a lot of relevant results... albeit somehow a little more complicated. It deserves a try though. Reference.com is an emai!
l query service focused on usenet groups, which lets you search through an archive of more than 16,000 newsgroups and a rapidly growing number of publicly accessible mailing lists. The quality and quantity of information on usenet is impressive.
To access the service by e-mail, you send an e-mail message containing
query commands to: [email protected]
Easiest query is:
FIND linguistic AND NOT phonetics OR glossary
Using the correct parameters you can control your query pretty good, and you can even order the service to email you regularly (say twice weekly) with all usenet messages of the last days where somebody spoke (anywhere on usenet) of the arguments you are in!
terested in. I have for instance there many automated queries running about "Web searching" arguments, this allows me to fish quite interesting results.
Reference.com will send you back a list of email messages where your query "phrases" appear. You'll decide which ones may interest you and you'll be able to get them in extenso sending a second message to Reference.COM with the relative ID numbers. You can !
decide how many messages should be sent to you in the first place, and how many lines of each emailing should be conserved for each message in the list: choose to few and you'll not be sure if a message does really interest you, choose too many and your lis!
t will be too huge.
Here some examples:
1) easy
To: [email protected]
Subject: (nothing here)
Text: FIND Web AND searching
END
2) more complex
To: [email protected]
Subject: (nothing here)
Text: FIND DISPLAY 10 LINES DISPLAY 100 HITS Web AND email OR querying WHERE AGE < 7 DAYS WHERE SUBJECT CONTAINS search
END
3) even more complex (and automated) queries are possible. You'll in this case have to register the service (for free) in order to build and develop your automated queries (through easy little scripts). Send an email with "help" as text to Email-Queries@Ref!
erence.COM, you'll get full instructions.
Exchange may add a signature to your e-mail messages. Specify "END" as the last word in the body of the message: this prevents the robot on reference.com from interpreting your signature as a query command.
Go ahead, enjoy!
fravia+, December 1966
how to search 5
how to search 7
how to search 8
Entrance
links
~~
tools
~~
antismut
anonymity
search (lesson 6)
~~
~~
search_forms
mail_fravia
fravia+ 04 Nov 97