perl_es1.htm: Perl@usa.net: How to reverse a "free" service

	[email protected] How to reverse a "free" service	Removing banners
16 July 1999	by [blue]
	Courtesy of Fravia's searchlores.org
fra_00xx 981607 [blue] 1000 AA RB	Woah! I'm very happy to have received this beautiful essay by [blue] This is relevant both for the bots section and the anti-advertisement section. I didn't think, when I began my anti-advertisement section, that so many results would have been possible in such a short time. This web of hours is truly wondrous! The various removing banners techniques described in our Removing banners section will give anyone with an inquiring mind enough material to surf unsulled by vulgar advertisement. [blue]'s essay is a very nice introduction to 'active-perl' activities. I expect from him and from all my anti-banners ingenious friends even more! Let's annoy the banner-morons at least as much as they annoy us!	Bots ~ Anti-advertisement
	There is a crack, a crack in everything That's how the light gets in
Rating	(x)Beginner ( )Intermediate ( )Advanced ( )Expert

Perl is for Internet like assembly for programs: a razor sharp blade, which any reverser should be able to use.

[email protected]
How to reverse a "free" service
Written by [blue]

Introduction

This essay shows how to use the most powerful Internet programming tool, Perl to create our own e-mail client for use with Net@ddress to get our email without browsing all those advertisements pages. By doing this you'll learn how to use Perl to get HTML pages, parse them, POST data and how to use regular expressions to extract valuable information buried in HTML tags.
To understand this you must have at least basic knowledge of Perl, HTML, HTTP and java-script (some OOP knowledge would help with some Perl modules).

Tools required

Java script enabled browser (any browser)
Lynx (not really needed): lynx.browser.org
Perl
- Perl home page: www.perl.org
- Perl for Win32: www.activestate.com
- CPAN (Perl Archives): www.cpan.org

If you have problems getting and installing Perl write to me and I'll help you (or better read the damn docs).

Target's URL/FTP

www.netaddress.com

Program History

None.

Essay

Until March 1999 Net@ddress was a good free e-mail provider for a small account (with a 5-MB quota):

big enough to remain in business after 6 months (at least hopefully)
free pop3 access, so anyone with at least half brain could setup his e-mail client to never see an advertisement

From March they decided to close up the pop3 server (I mean to request a fee for it). What could I do? I hate those never-ending advertisements and frames.

Our target: To get our e-mail in a local file in a mbox-like format.

Warning: This essay is not for Perl or HTML wizards. The program we will write is not optimized. It's just an exercise in both HTML-HTTP and Perl. The wizards should take a look at full source code to see if they find something useful.

1. Basics - Getting pages

I suppose you are familiar with Net@ddress. If not, create a test account, send a few e-mails, read them, print them, feel your target. Looks hard? It is hard using a normal programming language like C, it's not using good tools: shell scripts with netcat, expect, or Perl. We have few tasks to accomplish: get pages, login and extract messages from HTML code. The hurried ones can go directly to full source code.
How do we get the login page? In C you will have to find the IP address, open a socket, write some info (what info?) and read the answer. Here is how you do it in Perl:

# Create the user agent object
$ua = new LWP::UserAgent;
$ua->timeout($timeout);

# Set up a get request. We will use the same get 
# request object for all URLs.
$req_get = new HTTP::Request "GET";

# Send the request and get a response back from the server
to_err "Getting login page $url ... ";

$req_get->url($url);
$response = $ua->request($req_get);
if ( not $response->is_success) {
	to_err "\n\nError\n\n";
	to_err $response->message; 
	exit 1;
}

You can, of course open sockets in Perl but it's no use. We can use ready-made components (or modules). At CPAN you can find a huge collection (more than 600 MB) of Perl modules. For our purposes we will use LWP, HTTP::Request::Common, HTML::Entities, and HTML::TokeParser. To use a module just say you need it:

use LWP;
use HTTP::Request::Common;
use HTML::Entities;     # for &lt;
use HTML::TokeParser;

It's like #include <stdio.h> in C. Those modules will give us an easy way to get pages, parse them and post responses.
Let's analyze the above code piece by piece:

# Create the user agent object
$ua = new LWP::UserAgent;
$ua->timeout($timeout);

Every action (whatever is the protocol: HTTP, FTP, FILE, GOPHER, etc.) is taken through a user agent. A user agent performs a request and return a response. Here we create our user agent and set it's timeout to 180 seconds. If timeout expires (during processing a request) the user agent generate an error response.

# Set up a get request. We will use the same get request object for all URLs.
$req_get = new HTTP::Request "GET";

In HTTP protocol two methods are the most used: GET and POST. GET for GETting information and POST for POSTing responses to server.

$req_get->url($url);
$response = $ua->request($req_get);
if ( not $response->is_success) {
	to_err "\n\nError\n\n";
	to_err $response->message; 
	exit 1;
}

In variable $url we stored (see full source code) our target address: "http://www.netaddress.com". The line:


$response = $ua->request($req_get);

actually perform the request and return an answer. to_err it's a subroutine which prints a string on STDERR (see full source code).

Try for yourself to make program print the source of the page it just fetched.

2. Logging in - Parsing HTML and POST-ing data

By checking the source we can see how they verify that our browser supports java-script: on form "loginform" they initialize hidden input "java-script" with "none" and on submit, in function "SubmitLogin" called "onSubmit" set it as appropiate so if your browser don't support java-script they get "none" and return other page. For learning we gonna analyze this page and extract necessary values from "loginform". To parse a HTML page we need a "TokeParser" object:

$p = HTML::TokeParser->new(\$response->content) || die $!;

Now get all tokens from HTML page searching for a form named "loginform":

# Search login form
$found = 0;
while ($token = $p->get_tag("form")) {
	my $name = $token->[1]{name} || "-";

	if ($name eq "loginform") {
		$found = 1;

Get "action" (where to post) and method (which should be POST):

$action= $token->[1]{action} || die $!;
$method= $token->[1]{method} || die $!;
if ( $method ne "POST" ) {
	to_err "\n\nError - Login Form method is not POST ($method)\n\n";
	exit 1;
}

Now get values for all interesting input elements. Note that for "java-script" we don't use the supplied value but instead choose a legal values from "SubmitLogin" function.

$token = $p->get_tag("input");
($token->[1]{name} eq "LoginState") || die $token->[1]{name};
$LoginState = $token->[1]{value};

$token = $p->get_tag("input");
($token->[1]{name} eq "SuccessfulLogin") || die $token->[1]{name};
$SuccessfulLogin = $token->[1]{value};

$token = $p->get_tag("input");
($token->[1]{name} eq "NewServerName") || die $token->[1]{name};
$NewServerName = $token->[1]{value};

$token = $p->get_tag("input");
($token->[1]{name} eq "java-script") || die $token->[1]{name};
$java-script = "JScript1.0";

$token = $p->get_tag("input");
($token->[1]{name} eq "DomainID") || die $token->[1]{name};
$DomainID = $token->[1]{value};

$token = $p->get_tag("input");
($token->[1]{name} eq "NA31site") || die $token->[1]{name};
$NA31site = $token->[1]{value};

$token = $p->get_tag("input");
($token->[1]{name} eq "NA31port") || die $token->[1]{name};
$NA31port = $token->[1]{value};

last;
	}

}

if ($found != 1) {
	to_err "\n\nError - Can't find LoginForm\n\n";
	exit 1;
}

to_err "Ok.\n";

Now we have all the information needed to construct our response to login form.

# Create POST object 
# (this way is more clear than: new HTTP::Request "POST"...)
to_err "Posting login data ... ";

$req_put = POST $url."/".$action, 
		[	LoginState => $LoginState,
			SuccessfulLogin => $SuccessfulLogin,
			NewServerName => $NewServerName,
			java-script => $java-script,
			DomainID => $DomainID,
			NA31site => $NA31site,
			NA31port => $NA31port,
			UserID => $user,
			passwd => $pass,
		];

$response = $ua->request($req_put);

if ( not $response->is_success) {
	to_err "\n\nError\n\n";
	to_err $response->message; 
	exit 1;
}

Already familiar: construct a request, pass it to our agent, perform request and get response.

Probably you noticed that after login you are "redirected" a another page (BTW with frames). This is the moment where I used Lynx and it's very nice "trace" option. I save locally a login page and I modified to initialize "java-script" input with "Jscript1.0". I start Lynx with "trace" from this page and made a succesful login. The trick is a "Refresh" header tag which redirects us to the correct page.
To parse header we need a "HeadParser" :-):

$p = HTML::HeadParser->new; 
$p->parse($response->content);
if ( not $entry = $p->header('Refresh') ) {
	to_err "\n\nError - Can't find Refresh in page\n\n";
	exit 1;
}

Now variable "entry" contains value for tag "Refresh". The protection of Net@ddress is based on a random number generated on each login (if someone know how they generate it, please tell me). We can extract this number from value of "Refresh" tag. As you can see from subroutines having this number we can directly access our Inbox. Of course they present it in a frame but we don't need the frame :-). We will see soon what we need to access individual messages.

3. Getting messages - Regular expressions

To access our Inbox we have to extract that number from "Refresh" tag value.

$entry = substr($entry, 6);

Discard few characters from "entry" (what characters ?).

# Now entry is something like 
# http://www.netaddress.com/tpl/Door/?????/Welcome
#
# Inbox folder should be at 
# .../tpl/Mail/????/List?FolderID=-4 and so on
# 

# Bad trick, should use constants
if ($entry =~ m#oor/(.*?)/Wel#) {
	$number=$1;

All the magic is in that ugly line. To understand it you need to take a break and read "perlre" or a documentation on "regular expressions". This line can be read this way: try to mach the string "oor/", followed by any char "." repeated zero or more times "*?", followed by string "/Wel". I told you to learn about regular expressions :-). As a side effect of match Perl store the expression in variable "1".

Now, getting the Inbox page is piece of cake:

$inbox = b_inbox ($number);
$req_get->url($inbox);
$response = $ua->request($req_get);

if ( not $response->is_success) {
	to_err "\n\nError\n\n";
	to_err $response->message;
	exit 0;
}

Take a look at Inbox page source. Remember our target? To get our e-mail in a local file in a mbox-like format. Now I'm expecting you to browse a little and find where is the best format of messages.

Stop reading now and try to solve this.

It's in "PrintPreview" with full headers (of course you didn't forgot about headers).

How can we get there? We need message number (which is also the SMTP id). Back to Inbox. See where are the links to messages?. Where are the "Subject" and "From" fields?. I suppose you see we need the values of MR[i].Q if not, take a closer look.

# Nice, :-)
@mails = ($response->content =~ m/^MR\[i\]\.Q='(.*?)';/mg );

By using "@" we request Perl to give us an array with all values it's find. The regular expression "m/^MR\[i\]\.Q='(.*?)';/mg" can be read: considering it a multiline string match globally "mg", at the beginning of line "^" string "MR[i].Q=", followed by any char "." repeated zero or more times "*?", followed by string ";". By using operator "=~" we tell Perl to give us what it match (for ".*?"). After that it's easy to get our messages. The rest of program is for you to analyze and improve.

Can you find a way to fetch one page less ? For those which will answer to this question I'll provide (if they need) more help with the last part of program.
My address is john_smith86(at)usa(dot)com

4. Full source code

#
# Net Pop:	Get Net@ddress e-mail
#
#
#	Written by [blue] <[email protected]>
#

use strict 'subs';

use LWP;
use HTTP::Request::Common;
use HTML::Entities;     # for &lt;
use HTML::TokeParser;


# Constants
$url = "http://www.netaddress.com";
$timeout = 180;

$TrashID = "-1";
$InboxID = "-4";


# Subroutines 
sub to_err;	# Print a string to STDERR
sub b_inbox;	# Build Inbox URL
sub b_pprev;	# Build PrintPreview URL
sub b_delete;	# Build Delete URL

# Stollen from lwp-request 
$progname = $0;
$progname =~ s,.*/,,;		# use basename only
$progname =~ s/\.\w*$//;	# strip extension, if any
to_err "\nThis is $progname using libwww-perl-$LWP::VERSION\n\n";

$user="example";
$pass="password";

to_err "Getting mail for $user.\n\n";

# Create the user agent object
$ua = new LWP::UserAgent;
$ua->timeout($timeout);

# Set up a get request.  We will use the same 
# get request object for all URLs.
$req_get = new HTTP::Request "GET";

# Send the request and get a response back from the server
to_err "Getting login page $url ... ";

$req_get->url($url);
$response = $ua->request($req_get);
if ( not $response->is_success) {
	to_err "\n\nError\n\n";
	to_err $response->message; 
	exit 1;
}

to_err "Parsing it ... ";

$p = HTML::TokeParser->new(\$response->content) || die $!;

# Search login form
$found = 0;
while ($token = $p->get_tag("form")) {
	my $name = $token->[1]{name} || "-";

	if ($name eq "loginform") {
		$found = 1;

$action= $token->[1]{action} || die $!;
$method= $token->[1]{method} || die $!;
if ( $method ne "POST" ) {
	to_err "\n\nError - Login Form method is not POST ($method)\n\n";
	exit 1;
}

$token = $p->get_tag("input");
($token->[1]{name} eq "LoginState") || die $token->[1]{name};
$LoginState = $token->[1]{value};

$token = $p->get_tag("input");
($token->[1]{name} eq "SuccessfulLogin") || die $token->[1]{name};
$SuccessfulLogin = $token->[1]{value};

$token = $p->get_tag("input");
($token->[1]{name} eq "NewServerName") || die $token->[1]{name};
$NewServerName = $token->[1]{value};

$token = $p->get_tag("input");
($token->[1]{name} eq "java-script") || die $token->[1]{name};
$java-script = "JScript1.0";

$token = $p->get_tag("input");
($token->[1]{name} eq "DomainID") || die $token->[1]{name};
$DomainID = $token->[1]{value};

$token = $p->get_tag("input");
($token->[1]{name} eq "NA31site") || die $token->[1]{name};
$NA31site = $token->[1]{value};

$token = $p->get_tag("input");
($token->[1]{name} eq "NA31port") || die $token->[1]{name};
$NA31port = $token->[1]{value};

last;
	}

}

if ($found != 1) {
	to_err "\n\nError - Can't find LoginForm\n\n";
	exit 1;
}

to_err "Ok.\n";

# Create POST object (this way is more clear 
# than: new HTTP::Request "POST"...)
to_err "Posting login data ... ";

$req_put = POST $url."/".$action, 
		[	LoginState => $LoginState,
			SuccessfulLogin => $SuccessfulLogin,
			NewServerName => $NewServerName,
			java-script => $java-script,
			DomainID => $DomainID,
			NA31site => $NA31site,
			NA31port => $NA31port,
			UserID => $user,
			passwd => $pass,
		];

$response = $ua->request($req_put);

if ( not $response->is_success) {
	to_err "\n\nError\n\n";
	to_err $response->message; 
	exit 1;
}

# Parsing answer to loggin
to_err "Parsing answer ... ";

$p = HTML::HeadParser->new; 
$p->parse($response->content);
if ( not $entry = $p->header('Refresh') ) {
	to_err "\n\nError - Can't find Refresh in page\n\n";
	exit 1;
}

$entry = substr($entry, 6);
# Now entry is something like 
# http://www.netaddress.com/tpl/Door/?????/Welcome
#
# Inbox folder should be at .../tpl/Mail/????/List?FolderID=-4 and so on
# 

# Bad trick, should use constants
if ($entry =~ m#oor/(.*?)/Wel#) {
	$number=$1;
} else {
	to_err "\n\nError - Bad redirect address $entry\n\n";
	exit 1;
}

to_err "Ok ($number)\n";

to_err "Getting Inbox page ... ";

$inbox = b_inbox ($number);
$req_get->url($inbox);
$response = $ua->request($req_get);

if ( not $response->is_success) {
	to_err "\n\nError\n\n";
	to_err $response->message;
	exit 0;
}

to_err "Ok.\n";

# Nice, :-)
@mails = ($response->content =~ m/^MR\[i\]\.Q='(.*?)';/mg );

if ($#mails == -1 ) {
	to_err "\nYou have no mail.\n\n";
	exit 0;
}

to_err "\nYou have ";
to_err $#mails+1;
to_err " messages.\n\n";

foreach $msg (@mails) {
	to_err "Getting message $msg ... ";

	$amsg = b_pprev($number);
	$amsg .= "Q=".$msg.":".$InboxID;
	$amsg .= "&Headers=True";
	$amsg .= "&FolderID=".$InboxID;
	$amsg .= "&Sort=Date";

	$req_get->url($amsg);
	$response = $ua->request($req_get);

	if ( not $response->is_success) {
		to_err "\n\nError\n\n";
		to_err $response->message; 
		exit 1;
	}

	to_err "Parsing it ... ";

	# Get the body of message (everything is 
        # between <body> and </body>	
($junk, $body) = ( $response->content =~ m/<body(.*?)>(.*?)<\/body>/msg );
	
	# HTML spaces -> /dev/null
	$body =~ s/  //misg;
#
# Ugly !!!
# match:
#	any string of zero or more spaces and TABS 
#	AND NOT followed by </tr>
#	AND NOT followd by <br>
#	AND followed by <
#	AND followed by A string of any char EXCEPT <>
#	AND followed by >
#	AND followed by a string of space and TAB
#	AND followed by NEW-LINE
# Remove it (substitute with empty string)
# 
$body =~ s/([ \t]*)(?!<\/tr>)(?!<br>)<([^<>]*?)>([ \t]*)(\n?)//misg;

	# Remove empty lines + -> one or more !
	$body =~ s/([ \t]*)(\n)+//msg;

	# Replace <BR> and </TR> with NL
	$body =~ s/[ \t]*<br>/\n/misg;
	$body =~ s/[ \t]*<\/tr>/\n/misg;

	decode_entities($body);

	print "$body\n\n";

	to_err "Ok.\n";
	to_err "Deleting message $msg ... ";
	
	# Create POST object (this way is more clear than: new HTTP::Request "POST"...)
	$req_put = POST b_delete($number), 
		[	Q => $msg,
			N => "",
			F => $TrashID,
			FolderID => $InboxID,
			Sort => "Date",
		];


	$response = $ua->request($req_put);

	if ( not $response->is_success) {
		to_err "\n\nError\n\n";
		to_err $response->message; 
		exit 1;
	}

	to_err "Ok.\n";

}


# ----- Subroutines

# Print a string to STDERR
sub to_err {
        die "to_err: Need an argument !" unless $_[0];

	print STDERR $_[0];
}

#Build Inbox URL
sub b_inbox {
        die "b_inbox: Need an argument !" unless $_[0];
	
	return $url."/tpl/Mail/".$_[0]."/List?FolderID=-4";

}

# Build PrintPreview URL
sub b_pprev {
        die "b_pprev: Need an argument !" unless $_[0];
	
	return $url."/tpl/Message/".$_[0]."/PrintPreview?";

}

# Build Delete URL
sub b_delete {
        die "b_delete: Need an argument !" unless $_[0];
	
	return $url."/tpl/Message/".$_[0]."/Move";


}

Final Notes

You see a small example of how to use Perl to build you own tools. Perl is not easy to learn (especially regular expressions) but very powerful.

You can read carefully the "Subscriber Agreement". They don't mention that you are bound to see those advertisements, so you shouldn't see them.

Ob Duh

I wont even bother explaining you that you should BUILD your own bots and tools to avoid (or destroy :-) all those commercial banner-clicking infernos you are compelled to slurp. I will bother telling you, though, that you should send a copy of those same bots and tools to good old fravia+
Building our weapons on each other's shoulders we'll transform the next Millennium in an advertisement-free aera!