Building A PHP Proxy Judge

PHP
Sometimes the best way to educate yourself about an unknown subjects is to learn by doing. This weekend I did my own version of a Facebook Hackathon by creating a proxy judge.

What's a Proxy?

First the basics. Just what is a proxy? Its a computing device that sits on a network that plays the middleman routing requests from one machine to another.

On the Internet, there are hundreds of thousands of open proxy servers for public use. Generally these can be used for anonymity purposes to mask your real IP address and as a way to cache open content.

Where To Find Open Public Proxy Servers?

There are a few ways to find proxy servers on the Internet.

You can find websites that maintain lists of proxy addresses. Google "proxy lists" and you will find a ton of websites showing you all sorts of different types of proxies arranged in type, location and speed. Some lists are fresh and shown in real time while others tend to be stale.

You can also find them by using an IP scanner that takes an IP range and a set of ports to poke. Windows programs like Angry IP or Superscan will attempt to connect to an address and port and give you feedback on its status. You then take the generated list and feed it into a proxy judge like Charon to rate it.

You can lease private proxy IP servers. These vendors will sell you a batch of IPs at around $1-$2 per IP on a monthly basis.

Types of Proxies

Open public proxies come in three flavors:

  1. Transparent
  2. Anonymous
  3. Elite

When we talk about anonymity, we are talking about how much can a proxy fake a caller out so that it can't be identified as a proxy. This is usually determined by looking at the HTTP header returned in a request.

With elite proxies, there are no header fields returned to indicate that the server is a proxy.

With anonymous proxies, you will normally see HTTP Via/Via fields indicating that the server is a proxy.

With transparent proxies, they will typically show HTTP_X_FORWARDED/X_FORWARDED header fields that contain a list of IP addresses used in the connection. This can include your own IP address.

Judge, Judge, Who Is The Proxy Judge?

Lets pose a problem. Suppose you are handed a blind list of IP addresses. Some proxies and some not. For the IP addresses that are proxies, can you determine what type they are?

A proxy judge is a program that takes as input a batch of IP addresses. It will look at each IP individually and determine its type. The proxy judge can be written in PHP and run on a public Internet server. Another script can run on the same or different server that gathers proxy information on the judge's behalf. This information is then sent back to the proxy judge to show some statistics and maybe even store in a database this information for later use.

Pictorially, it looks something like this:
Proxy judge

The Proxy Test Component

In this article I'm not going to provide the full code as it is way too much to display and would distract from this discussion, Rather, I will show pieces of it to give you some ideas on how to go build your own.

First, we must determine what proxy server related variables to monitor. In Apache, we can look for these server keys:

$server_tests = array(
'CLIENT_IP',
'HTTP_CLIENT_IP',
'FORWARDED',
'FORWARDED_FOR',
'FORWARDED_FOR_IP',
'HTTP_PROXY_CONNECTION',
'HTTP_X_FORWARDED',
'HTTP_X_FORWARDED_FOR',
'HTTP_FORWARDED',
'HTTP_FORWARDED_FOR',
'HTTP_FORWARDED_FOR_IP',
'X_FORWARDED',
'X_FORWARDED_FOR',
'X_HTTP_FORWARDED_FOR',
'HTTP_VIA',
'VIA'
);

Some of these you will find are not part of the PHP $_SERVER variable default. This is ok. Proxy server vendors have used many different proxy header definition fields. These are the common ones.

What we want to do is test the existence of each one of these in our proxy tester and build a stream to send back to the proxy judge.

$out = "|MyProxyJudge|REMOTE_ADDR={$_SERVER['REMOTE_ADDR']}|ORIG_IP={$orig_ip}|SERVER_ADDR={$_SERVER['SERVER_ADDR']}|";

The output will start out with a stream of tokens. The first position is the signature to signify that the connection was successful and it indeed came from a proxy tester source. The second field REMOTE_ADDR will hold the address of the proxy server (not judge). The third field holds the originating IP address (you who invoke the judge script). The fourth field will contain the server address of the machine running the proxy test script.

Why do we need all this information? Good question. We need it in help determining if any of the servers are tracing our route.

In the case where none of the server variables are set, we declare the proxy as being elite because there is no tracing going on.
If at least one of these are set, we can set the rule to say it is anonymous. it the originating IP address is found in any of the set server variables, that would mean we have been traced and thus are transparent.

You may have different ideas about the levels of anonymity. Feel free to change if necessary. But for the most part, these are the variables you can test and use to build your categorization.

$bFailedTest = false;
$bTransparent = false;
foreach($server_tests as $test) {
if (isset($_SERVER[$test])) {
$bFailedTest = true;
$out .= $test . "=" . $_SERVER[$test] . "|";
if (strpos($_SERVER[$test], $orig_ip) != false) {
$bTransparent = true;
}
}
}

if ($bTransparent == true) {
$out .= "transparent";
echo $out;
exit;
}
if ($bFailedTest == true) {
$out .= "anonymous";
echo $out;
exit;
}
$out .= "elite";
echo $out;

Notice that in the output stream, the last token is the type of proxy. We'll see how this fits together in the Proxy Judge component.

The Proxy Judge Component

The proxy judge is HTML user view that has a form with a text area element. The user supplies a list of IP addresses and submits it to the judge. Each proxy is is parsed for host ip and port. A socket is opened and a HTTP GET request is made asking the proxy to open up the proxytest page:

list($host,$port) = explode(":",$proxy);

// open a socket connection to the site on the port to communicate with
$fp_sock = @fsockopen($host,$port,$err_no,$err_string,$response_timeout);
if (false == $fp_sock) {
echo "{$proxy} is bad. Could not open socket.<br/>";
return('bad');
}

// set the timeout in case it bumbles around or connection is slow
stream_set_timeout($fp_sock,$response_timeout);

// Go to the proxytest page
$originating_ip = $_SERVER['REMOTE_ADDR'];
$command = "GET {$url_check}?ip={$originating_ip} HTTP/1.0\r\nHost: {$_SERVER['SERVER_NAME']}\r\n\r\n";
fwrite($fp_sock,$command);

// read in 4K chunks what the proxy returned
$response = "";
while(!feof($fp_sock)) {
$response .= fread($fp_sock,4096);
}

$info = stream_get_meta_data($fp_sock);

// close the door
fclose($fp_sock);

If the proxy didnt time out or return a response, we then go out and parse the output we set in the proxy tester and show the response::

$tags = explode("|",$response);
$num_tags = count($tags);

foreach($tags as $tag) {
echo $tag . "<br/>";
}

// we didnt get the signature back
if ($tags[1] != $response_signature) {
echo "{$proxy} is bad. Signatures did not match in response.<br/>";
}
                // the proxy type
                echo $proxyType = $tags[$num_tags=1];

Summary

You've seen a glimpse of how to build a proxy judge. Feel free to take these ideas and build your own. One thing you can do to expand on this idea is add database code to keep track of each proxy IP address and maintain a history of its usefulness over time. if you own a website and are finding spam hitting your site, you can compare this list against the ip addresses from comments to determine if it came from a real IP or proxy. By doing this, you can reject comments that are hiding behind a proxy.


ASO ad


Filed under: