An Open Source Proxy Checker

essay by Don Quijote, 24 Mar 2000

Skip this intro and get me right to the essay ;-)
Actual working Open Source Proxy Checker can be found at http://ospc.cjb.net/.

Table of contents

  1. Appetizer
  2. First Course
  3. Main Course
  4. Dessert

 
Appetizer

Objective of this essay

To get you involved with PHP3. To give you yet another tool to implement whatever you want, as long as it's web related and doesn't spreng the inherent limits of PHP3.

Why yet another proxy checker?

What about the proxy checker features?

What about the PHP3 features?

I still didn't finish The Book. Check back here where I will include a www.amazing.com link (yes! to earn some more money with yet another covert click-trough advertisement ;-)

Tools you're going to use

PHP3   Get yourself a copy of the PHP3 sources, binaries and the manual from their website.
Time Show me the language that can be mastered in only 30 minutes ;-) but don't worry.
Web space Not stricly necessary, but just in case, check out Altern and XodoX.

History

There's a lot to write in these lines. But the best thing is to redirect the interested reader to the PHP3 site to read the full story about PHP/FI for themselves.

[...]The name of this first package was Personal Home Page Tools, which later became Personal Home Page Construction Kit.

At the same time I started playing with databases and wrote a tool to easily embed SQL queries into web pages. It was basically another CGI wrapper that parsed SQL queries and made it easy to create forms and tables based on these queries. This tool was named FI (Form Interpreter).

PHP/FI version 2.0 is a complete rewrite of these two packages combined into a single program. It has now evolved to the point where it is a simple programming language embedded inside HTML files.
The original acronym, PHP, has stuck.[...]

Just as a sidenote: Version 2 was full of bugs with poor documentation.
With version 3, PHP has evolved into a serious alternative to other server side scripting languages.
The beta of version 4 looks even better!

Architecture of this proxy checker

There will be two pages hosted on my current free web provider. The first one contains the HTML frontend and the full PHP3 code. The second one is just a test page that will be fetched through the proxy and verified by the PHP code of the first page.

Frontend

The page will show the typical title, short introduction, some references and finally, a form where the user enters the proxy address and port to check. The form is dispatched with a Submit button.
At the bottom of the page will be yet another typical closing note, together with the date of last modification and perhaps even a copyright notice.
Somewhere in between are some links to the pertaining RFCs, to the PHP3 source code, and to fravias searchlores site.

Backend

First task is to distinguish if this page is loaded for the first time or as a filled out form.
In the first case, nothing needs to be done, except, perhaps, showing the user which environment strings he passed through his browser to us.
The second case is much more interesting. Here we have as input the address of the proxy to check and the accompanying port info.

 
First Course

Many 101 example code is already on the web, check the links section. No need to code yet another Hello World proggie, but for the sake of it, and as a hommage for Kernigham & Ritchie, here we go...

Create this document and upload it to your PHP3 enabled server under the name hello.php3. Then try calling it from your browser and see what happens :-)

<html>
<body>
Preparing everything for the Hello World launch...
<br>
<?php
  printf("Hello World! Say hello to our %s visitor.<br>",$REMOTE_ADDR);
?>
That's all!
</body>
</html>
That was easy, wasn't it? No need to deal with user rights, with special paths, nor anything else commonly associated with bringing Perl scripts to run.
Besides, that code inside the PHP tags just looks like common C code :-)

 
Main Course

You did read every single line up to this point, didn't you?

Peculiarities

(it's not an illness ;)

Let's assume for a moment that the reader knows how to read C code, 'cause that alleviates the task of explaining the more occult twists of PHP3.
Here follows a short list of differences to C and other peculiarities:

  1. PHP is a server side scripting language. Please don't expect any interactive features as known from Java or JavaScript. Those cute tricks done with OnMouseOver event handlers are run on the clients machine. PHP scripts are always run on the server. Each and every interaction will always go through a page fetch, a page refresh or a form submission.
    There are even some nasty tricks which can be implemented in PHP3, aided by some nice features found on Apache servers in the .htacces file. But that is stuff for another essay.
  2. PHP code goes embedded anywhere in the normal html flow, enclosed within <?php and ?> tags. At the start, in the middle or at the end, it's just the same. The code is executed by the PHP module, and replaced by any output (print() and printf() functions) during that execution.
  3. Variables are easy to recognize, as all of them are prepended by $. There's no difference between strings, chars, ints or reals, even more, each variable can be cast into any mode, mostly dependand on the context. But don't worry, I did not use any typecasting in the sources.
    A nice feature of PHP is that all environment strings are automagically available as variables. Same is valid for passed form fields.
  4. Associative arrays are a nice extension to any language. Let me try to explain this feature with the following example:
    Instead of creating two arrays, one for christian names and one with matching lastnames, we just create a single array, using the lastnames as indices and filling in each slot with the corresponding christian name:
    /* 'outdated' style */
    $christname[0]='John';
      $lastname[0]='Doe';
    $christname[1]='Don';
      $lastname[1]='Quijote';
    
    /* 'slick' style */
    $name['Doe']='John';
    $name['Quijote']='Don';
    Yes, I know, don't tell me that this doesn't work when you have a huge tribe of Doe brothers or Quijotes.
    Another way to initialize associative arrays is by using the array keyword. You may look it up in the Quick Reference section at the PHP site.

    There are many ways to browse through these associative array. I use this code snippet:

    reset($name);
    while(list($last,$first)=each($name)) print("Hello $first $last\n");
  5. Surely you have observed that even strings are parsed and any variables found replaced by their values. Nice, isn't it?
    This only applies to "double quoted" strings. You can embed newlines (\n) as well as most other ASCII codes as known from C.
    Single quoted strings are not checked nor transformed, and probably render quicklier.
  6. Some words regarding variable scope. Variables are always local to the surrounding function. To access global variables, you need to create an explicit reference to them with a global $variable; line.
    Another advantage (or disadvantage for the cunning old C devils) is the free use (and misuse) of variables: there are no possibilities to declare them before actual use. Reminds me of good old BASIC times.
    There's also no int main() entrypoint. You just start coding with the actual code, no need for bureaucratic crap ;-)
Ok. Are you still following the text, or are you desperately looking for a quick way out of this maze?
The list can go on and on for hours, days and weeks, but in the end, the PHP code still keeps looking just like C code. And that's something I appreciate :-)

Core Code

Here is the stripped down source of the actual working engine of the proxy checker. Full sources don't belong here. They are available by following a link in the footer of that page.
<?php // Open Source Proxy Checker, severely stripped down
  error_reporting(0);  // no error reporting
  if(isset($PATH_INFO)&&($PATH_INFO=='/feedback.txt')) { // asked to behave like a mirror?
    Header('Content-Type: text/plain');
    $headers=getallheaders();
    reset($headers);
    while(list($header,$value)=each($headers)) print("$header === $value\n");
  } else { // testing the proxy
    $SockAdr='proxy.spaceproxy.com';
    $SockPort=80;
    $HL=array(
      'User-Agent' => 'Mozilla/4.0 (Windows 98;US) Opera 3.62 [en]',
      'Accept' => 'image/gif, image/x-xbitmap, image/jpeg, image/png, */*',
      'Pragma' => 'no-cache',
      'Connection' => 'keep-alive');
    print('<html><body><h1>Stripped down version of OSPC</h1>');
    printf('Sending request to proxy %s, port %s<br><pre>',$SockAdr,$SockPort);
    print("  GET http://$HTTP_HOST$SCRIPT_NAME/feedback.txt HTTP/1.1\n");
    print("  Host: $HTTP_HOST:80\n");
    reset($HL);
    while(list($header,$value)=each($HL)) print("  $header: $value\n");
    printf("</pre>Connecting to %s, port %s...<br>",$SockAdr,$SockPort);
    flush(); // force flushing of "printf" buffers.
    $fp=fsockopen($SockAdr,$SockPort);
    if($fp) { // you are reading this code? Wow, I'm proud of you!
      // start outputting our header
      fputs($fp,"GET http://$HTTP_HOST$SCRIPT_NAME/feedback.txt HTTP/1.1\r\n");
      fputs($fp,"Host: $HTTP_HOST:80\r\n");
      reset($HL);
      while(list($header,$value)=each($HL)) fputs($fp,"$header: $value\r\n");
      fputs($fp,"\r\n"); // terminate header structure by appending a blank line
      while(!feof($fp)) {
        $line=fgets($fp,4096); // I suppose no line will be longer than this
        if($line=="\r\n") break; // reached end of header
      } // end while
      if(!feof($fp)) {
        print('<br>Receiving...<pre>');
        while(!feof($fp)) {
          $line=trim(fgets($fp,4096));
          list($name,$value)=split(' === ',$line,2);
          if((strlen($name)>0)&&(strlen($value)>0)) print("  $name: $value\n");
        } // end while() (end of "file" reached)
        print("</pre>That's all!");
      } // end if
      fclose($fp);
    } // end if: end processing with a valid socket handle ($fp)
    print('</body></html>');
  } // end else: testing proxy
?>
I know, the code is ugly as hell, but what else can you do when you want to put some power into less than 50 lines and not wanting to be a candidate for a code obfuscation contest!

Functional View

You probably already guessed that I'm not a native english speaker (nor writer), but that won't stop you, or?

Anyway, let's have a short glimpse at a typical run of previous script:

Stripped down version of OSPC

Sending request to proxy proxy.spaceproxy.com, port 80
  GET http://dq.linuxave.net/stripped.php3/feedback.txt HTTP/1.1
  Host: dq.linuxave.net:80
  User-Agent: Mozilla/4.0 (Windows 98;US) Opera 3.62 [en]
  Accept: image/gif, image/x-xbitmap, image/jpeg, image/png, */*
  Pragma: no-cache
  Connection: keep-alive
Connecting to proxy.spaceproxy.com, port 80...

Receiving
  Accept: image/gif, image/x-xbitmap, image/jpeg, image/png, */*
  Host: dq.linuxave.net
  Pragma: no-cache
  User-Agent: Mozilla/4.0 (Windows 98;US) Opera 3.62 [en]
  Via: 1.1 - (DeleGate/6.1.0)
That's all!

Let's take that Core Code apart, and analyse what's supposed to be going on under the PHP hood.

The indentation already gives some nice hints about the overall structure: The whole code is separated by an if statement in two sections.

The variable $PATH_INFO is defined when the URL to this page includes something more in the path, right after this scripts name.
If this code was stored on a server as http://somewhere.com/stripped.php3 and we call up this URL:
http://somewhere.com/stripped.php3/test,
then $PATH_INFO will be /test.

The first part of this script just tells the client browser to expect plain text and not the default, avoiding an unfitting HTML rendering of a pure textual feedback page.
getallheaders() returns an array with all those bits of information sent from your browser through a proxy to arrive finally at a greedy logging bot on some web server. You better watch out what you're willing to send. IP disguise is not the only thing to do when you care at least a bit about anonymity.
The next two lines just traverse this array ($headers) and print out this information.

So much for that first and easy part. Let's have a look at the second part.
From the previous discussion, you should be able to figure out everything until that flush() line. Hint: Variable definitions, $HL being an array, some print statements starting to render the HTML page.

All print statements will at some time arrive at your clients browser, but first they have to be send by the server. This process is normally buffered to avoid sending too many small packets of info.
If, for some reason, you want to push the already buffered info to the client, you use the flush() function.

The most important part of this script is IMO the fsockopen() functionality: PHP tries to open a socket to a certain port ($SockPort) on a certain address ($SockAdr). You can also express that address in IPv4 format (4 groups of non-negative decimal numbers lower than 256, separated by dots).
I/O to this socket is done with fgets() and fputs().

I'm not that good at casting my own code into english words, so bear with me ;-)

I'm sending out a HTTP order to fetch a certain page (http://$HTTP_HOST$SCRIPT_NAME/feedback.txt) on the host $HTTP_HOST, port 80. You should know that 80 is the port commonly associated with HTML traffic.

Q: All nice and fun. But what the heck are $HTTP_HOST and $SCRIPTNAME supposed to be??

A: They are part of the environment variables. Please write yourself a little PHP script to print out the contents of these variables among some more info:

<?php
  phpinfo();
?>
The HTTP request is finished by outputting a CR-LF pair on an empty line. Right after that, the engine dives into a while() loop, remaining there until being told that there's nothing more available on that socket (Done by testing the boolean value of !feof($fp)).
Inside of that loop we keep all lines belonging to the HTML header, discarding them on our way. The end of the header signal is a CR-LF all alone on a line.

Next block of code just fetches the remaining data on that socket (belonging to the HTML body) and prints them out.

That's all!!

Not quite. There are many cute little functions making up the spicy flavor of PHP and helping at every corner. You should look them up in that Quick Reference section on the PHP site. Or even better, start downloading sources, binaries and manuals and install PHP on your machine and begin experimenting right away, you won't regret it!

Open Paths

The actual Open Source Proxy Checker is just an experiment dealing a bit with HTTP and another bit with PHP.
There are much too many ways to enhance this application. Some time ago, when starting this stuff, I wanted the script to point out the differences between the outgoing and the incoming stream, pointing out the dangers involved in seeing some environment variables in cleartext, etc.
That did not happen.

Sources are available, lucky you, so you can take them and enhance the app, adding your own bells and whistles. Please be so kind and mail me any use or changes you do.

 
Dessert

Final Notes

For those still wondering about the relationship of this essay with www.searchlores.org, let me tell you: PHP3 is a powerful language to build searching bots, spiders, redirectors, dynamic site mirroring, retrieval engines, link checkers, port scanners, mailer, secured or disguised web sites, brute force password crackers, and so on.

All these examples will run on a server, and not on your own machine, thus, not having the bandwidth limitation of your dial-up telephone line.
This narrow-bandwidth-line is only used to trigger the execution of the PHP3 script.

There's a very wide field of application for PHP3, most of what is known deals only with the commercial crap, i.e., web based shopping carts and related stuff.

PHP3 is also very helpful when implementing any client/server protocol. I wrote a client to retrieve files from a CVS server, implementing the whole client-server CVS handshaking in PHP3.

Hopefully there will be more essays dealing with these scripting languages. Besides PHP3 there's also Perl, Rebol and Python. Each of them has its own advantages/disadvantages, shallow or steeper learning curves, function or object oriented approach. In the end, it's just a matter of personal taste which language is your preferred one.

I can't resist the temptation of including some fly outs ;-)

You might provide me with some of your feedback through Fravia's messageboard (preferred way of sharing knowledge) or through my email (dq at altern dot org).

Good Netiquette

Are you expecting a long preaching sermon? Not this time! Just stop. Think for a minute if you really want to loose the possiblity of experimenting with your PHP3 code on those free web servers or if you want to screw up everything by running bandwidth intensive and/or dangerous scripts.
Keep a low profile and don't mess too much with your provider. I know it's possible, but don't do it ;-)

Relevant Links

Let me close this essay with the following hyperlinked bibliography, and remember to search for similar stuff through the many search engines.


~~~~~~ O ~~~~~~