formir.htm: How to search the web, by fravia+ (¯`·.¸(¯`·.¸ +forseti's wget bot ¸.·´¯)¸.·´¯)

+Forseti's own made wget mirroring bot

Fravia+, I noticed you have Luc's perl mirroring bot up, and I thought you'd like to have mine as well. It's not quite as long and involved :)

I use wget ( get=http://www.gnu.org/software/wget/wget.html ) and call it via a shell script to mirror searchlores.org. I also have a script to move the images and other sub-directories after wget gets them, as my mirror is not a root directory.

Here they are:

The mirror script:

-----------------------

#!/bin/sh cd /qu00l.net/html/s/ && mv wget-log.1 wget-log && wget -r -m -b www.searchlores.org

-----------------------

Simple /sh script, the first command sets the working directory, the second overwrites the log so as not to have infinitely iterated wget.logs, I think 2 will do nicely for comparison, and the third calls wget and instructs it to recursively (-r) mirror (-m) and run in the background (-b).

And the moving script:

-----------------------

#!/bin/sh -
cd /qu00l.net/html/s/
cp ./www.searchlores.org/fiatlu/* ./fiatlu/
cp ./www.searchlores.org/images/* ./images/
cp ./www.searchlores.org/pdffing/* ./pdffing/
cp ./www.searchlores.org/realicra/* ./realicra/
cp ./www.searchlores.org/zipped/* ./zipped/
cp ./www.searchlores.org/protec/* ./protec/

-----------------------

Which is the only part of the scripts that may need to change, as you make new child dirs I add them to this script. This arises as wget places it's output into a dir named www.searchlores.org, whereas on the actual site this dir would be /, therefore everything is copied one directory up so as I dont have to write a crazy script to rewrite all the url's in all of the pages.

I'm using 2 scripts instead of one, because wget works in the background and using && (which means 'upon completion of the last command do the next) won't work. The shell sees that the command has been executed but it actually has just gone to the background.

Wget is a great program too, it can grab images, or single files, or whole sites, or just sections, and it's very flexible and configurable.

Enjoy :)

Forseti+