NAV
code

1 - Introduction

The nohodo network was built to meet the needs of large clients in the financial sector. Their anonymization requirements demanded a high level of technical sophistication at a reasonable price, a combination that was not available from commercial anonymization providers. This wiki is for developers to enable nohodo network for their proxy needs.

2 - Basic Operating Principles

The nohodo proxy network is configured as two layers of proxy servers. The top layer, consisting of what are called head nodes, are the proxy servers addressed by clients. The head nodes relay client request through a second layer of proxy servers, called child nodes, which relay the requests to their final destination, the resource identified by the URL in the client’s original request. The point of this two-layer architecture is to provide clients the ease of addressing a single proxy server, while allowing them to send requests to a target host many times in rapid succession without invoking a defensive response from that host that could block them from sending any more requests. The head nodes can do this because they have access to many child nodes, so successive requests to the same host can be routed through different child nodes.

Note that the head node must behave the same way if those 100 requests come from different clients, because to the target host the client IPs are invisible; all they see is the IP of the nohodo child node that ultimately sends the request. In the nohodo proxy network there are few head nodes and many child nodes. All of the head nodes work this way, sending client requests to the same host through a series of child nodes so as to obscure the fact that they are coming from the same IP address, with the goal of preventing a host server from blocking any child node IP due to excessive use.

2.1 Child Node Cycling

When a client makes multiple requests the nohodo head nodes will automatically rotate child node proxies without requiring input from the user. It is important to note that the head nodes are rate limited by nohodo to protect the child nodes from being blocked by the targeted host. This document refers to rate limiting as a nap time. Suppose the nap time is set to 90 seconds and there are 5,000 child nodes for a given host. That would mean that the head node could allow no more than 5,000 requests to that host in any 90 second period, because once all nodes in the pool have been used the next request would have to go through a node that was already used within that time period.
Note that if more than one client were sending requests to the same host, they would all share in this limit: in this example, it is not the case that each client would get 5000 requests to that host every 90 seconds. The limit applies to the total of all requests sent by all clients to the same host.

2.1.2 Important Header Considerations

By default, nohodo does not pass or strip the user-agent header. nohodo asks all clients to pass a modern user-agent with their requests. There are no specific requirements for the user-agent header but nohodo has seen hosts block a request because of an outdated user-agent or no user-agent.
By default, nohodo does not pass or strip cookies. Some hosts block requests if certain cookie parameters are not passed. Please determine whether or not cookies are required before sending requests to nohodo.

2.1.3 429 Errors

When a head node sees a request that could only be completed by exceeding the nap time of all available child nodes, it refuses to send it on and instead immediately sends an HTTP response to the client with a status code of 429 and a reason string of ‘Whoa, Nelly!’. This informs the user that their request has been rejected due to the minimum cycle limit. Clients may repeat the request at a later time Whenever a 429 status message is sent, the server includes a custom header, ‘nohodo-Hold-Time’, that provides a value in seconds that is the time that must elapse before a request to the given host will be allowed.
nohodo clients are not charged for HTTP 429 requests because the requests never connects to the child node and in turn, the child node never sends the request to the target host.

2.2 Sticky Addressing

Normally if the server receives a series of requests to the same host it will change the child node it uses for each request. Although this child node cycling is one of the main features of the nohodo network, it can cause problems with some hosts.
For example, a website that uses a shopping cart style interface over a database may require a series of requests to obtain some data, and furthermore the host may require that all requests in the series come from the same IP address. Child node cycling would not work for this host. To deal with hosts like this, the client may make use of a feature called ‘sticky’ addressing. A client makes a sticky address by including a custom header, ‘nohodo-Child-Node’, to specify the node id of a particular child node to use for a request. By specifying the same node id for a series of requests the client can meet same-source-IP requirement of hosts like the one described above.

Because this mode of child proxy use violates the basic child node cycling rules, there are additional rules that govern the user of the nohodo-Child-Node header. One is that a client must only use node ids received in a response to a prior non-sticky request. Another is that there is a limit to the number of consecutive sticky requests that the server will allow.

2.2.1 Sticky Node Id Usage

Whether sticky or non-sticky, every successfully request elicits a response that contains a nohodo-Child-Node header with the node id of the child proxy used to complete the request. Therefore, to achieve a series of requests that all have the same source IP, a client may make an initial non-sticky request, read the nohodo-Child-Node header from the response, and then include that header in all subsequent requests. In fact, this method of using sticky requests is required and enforced by the server. Sticky requests that do not observe this requirement may fail with a 429 error. Clients should consider sticky requests to be an extension of the normal non-sticky request mechanism, and always use the feature this way.

2.2.2 Sticky Throttling

As described above, the reason for child node cycling is to avoid an overuse of a host by one child proxy that could get it blocked. Sticky addressing overrides this cycling in two ways. The first is that it allows the same child proxy to be used multiple times in a row. The second is that it does not enforce a nap time between those requests.
Yet the need to avoid having child proxy servers blocked by a host remains. The idea for sticky requests is to relax the non-sticky throttling requirements for a short time, but to enforce them eventually. So, to limit the number of consecutive sticky requests, the server counts sticky requests as they come in and allows them only up to a number that can be set on a per-host or a global basis. When the limit is reached, the server returns a 429 error for all further sticky requests to the same node id. And once the sticky limit is reached, it can only be cleared by a non-sticky request that happens to be assigned to that node id. Therefore, strictly speaking, the sticky limit never times out, in the sense that further sticky requests to the node will fail forever until a non-sticky requests succeeds. (And of course, a non-sticky request cannot be directed to a particular child node by the client.)

With regard to nap time, for sticky requests it is enforced in an aggregate way. Before a node used for sticky requests can be used for a non-sticky request, a period of time must pass from the last non-sticky request that is equal to the nap time times the total number of requests on that node id. For example, if the nap time is 90 seconds, a node used for 10 sticky requests following the initial non-sticky request (total of 11 requests) would be on hold for a total of 990 seconds from the time of the initial non-sticky request.

2.2.3 429 Errors

When the server sends a 429 due to a sticky request exceeding the sticky limit, the nohodo-Hold-Time value is set to 2147483647, which is the maximum value for a 32-bit integer. The intent is to convey that the node will never become available for another sticky request in this session. Note: since a node used for sticky requests accumulates multiple nap times, it is possible that a non-sticky request that arrives when all nodes are on hold could return a nohodo-Hold-Time value that exceeds the configured nap time value. (Although this could only happen if all child nodes in the pool were on hold from a series of sticky requests.)

2.2.4 418 Errors

If the client sends an invalid node id in a nohodo-Child-Node header the server will respond with a 418 status code and a reason string of “Bad Child Node ID”. The response will include a nohodo-Child-Node header with the invalid value that was received.

3 - HTTP Usage and Extensions

3.1 HTTP Status Codes

Every http or https request sends back a response with information such as a response code. There are five standard response clases specified by the first digit. Where possible, nohodo uses standard HTTP response status codes in ways that are consistent with the specifications outlined in Section 6 of the IETF RFC 7231. The definition for each status code listed below were taken from Section 6 of the IETF RFC 7231. Additional status codes and additional information can be found on the IETF website.

Status codes

Status Reason Use Cases
200 OK The request has succeeded.
301 Moved Permanently The target resource has been assigned a new permanent URI and any future references to this resource ought to use one of the enclosed URIs.
302 Found the target resource resides temporarily under a different URI. Since the redirection might be altered on occasion, the client ought to continue to use the effective request URI for future requests.
403 Forbidden The server understood the request but refuses to authorize it. A server that wishes to make public why the request has been forbidden can describe that reason in the response payload (if any).
404 Not Found The origin server did not find a current representation for the target resource or is not willing to disclose that one exists. A 404 status code does not indicate whether this lack of representation is temporary or permanent.
407 ProxAuth Required Client request either contains no Proxy-Authenticate header or one with invalid credentials, and the IP is not whitelisted
418* Bad Child Node ID Client request includes a nohodo-child-node header with an invalid child node ID
429* Whoa, Nelly! Client requests to a particular host exceed the set throttle rate. For sticky requests the count of consecutive requests exceeds the set limit
443* tunnel CONNECT (https) requests are typically followed (in the same connection) by a series of encrypted request/response exchanges. Since nohodo can't see inside the encrypted tunnel, we identify requests inside the tunnel with a 443 response code. During collection, the client's spider receives an encrypted response with the actual response code. We consider 443s to be successful requests.
497* n/a Indicates that the client closed the connection before a response was received.
498* n/a Indicates that the nohodo child proxy server closed the connection before a response was received.
499* n/a Indicates the nohodo head node timed out waiting to receive input before a valid response message was received.
500 Internal Server Error The server encountered an unexpected condition that prevented it from fulfilling the request.
502 Bad Gateway The server, while acting as a gateway or proxy, received an invalid response from an inbound server it accessed while attempting to fulfill the request.
503 Service Unavailable The server is currently unable to handle the request due to a temporary overload or scheduled maintenance, which will likely be alleviated after some delay.
504 Gateway Timeout The server, while acting as a gateway or proxy, did not receive a timely response from an upstream server it needed to access in order to complete the request.

3.2 Custom Headers

JKonnect implements some features through custom header fields. These headers are described below.

3.2.1 nohodo Child Node Header

The server recognizes a custom header in requests that identifies the nohodo child node that should be used as the proxy for the request.
The format is:

nohodo-Child-Node: <child node id>

Example:

nohodo-Child-Node: 1234

If an invalid value is given the server fails the request and replies with a 418 status code and the description ‘Bad child node ID’. The server may also return a 429 status code if the request is invalid for another reason. The child node header is also present in every reply sent by the server that identifies what nohodo child node was used to forward the request. (The header is included in replies even where there was no header in the corresponding request.)

3.2.2 nohodo Hold Time Header

This header is included with every 429 response. For non-sticky requests the value is an integer that specifies how many second must elapse before an access to the same host may be successful.

When the 429 status code is the result of a sticky address the value is always 2147483647 (the maximum 32-bit integer value.)

nohodo Hold Time Examples:

   nohodo-Hold-Time: 7 (non-sticky)
   nohodo-Hold-Time: 42 (non-sticky)
   nohodo-Hold-Time: 2147483647 (sticky)

3.3 Response Messages

Nominally, responses returned by JKonnect will have been sourced by the target host and will included status codes in the 2xx and 3xx ranges. But target hosts may also return responses with 4xx or 5xx status codes. In those cases the client can distinguish between host responses and JKonnect error responses by examining the “Server” header, which for JKonnect responses will contain the string “JProxyKonnect”.

An example JKonnect error response:

HTTP/1.1 418 Bad Child Node ID
Server: JProxyKonnect
Content-Type: text/html
Content-Length: 0
Connection: close
nohodo-Child-Node: 123456789


3.3.1 Response Body

Most response messages sourced by JKonnect contain no body. When a response does contain a body it will consist of an HTML page with the status code and reason string in the title and the page body, along with the name of the server.

When a response does contain a body:

HTTP/1.1 429 Whoa, Nelly!
Server: JProxyKonnect
Content-Type: text/html
Connection: close
Content-Length: 121
nohodo-Hold-Time: 2147483647
nohodo-Child-Node: 1

<html><head><title>429 Whoa, Nelly!</title></head><body><h1>JProxyKonnect</h1>
<h2>429 Whoa, Nelly!</h2></body></html>

4 - Code Examples

4.1 Python


import os
import requests

user = 'user'
password = 'password'
address = 'headnode_address'
port = 'headnode_port'

proxy = "http://{}:{}@{}:{}".format(user, password, address, port)

url = 'http://exampleurl.com/'

proxies = {
    'http': proxy,
    'https': proxy,
}

headers = {
	'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0',
    'Cookie': '',
}

from requests import ReadTimeout, ConnectTimeout, HTTPError, Timeout, ConnectionError
r = requests.get(url, proxies=proxies, headers=headers, timeout=5.0)

request_line = "--------------------------------------------------\nProxy: {}\nRequested URL: {}\nHTTP Response Code: {}\n\nHeaders: {}\n\n".format(proxy, url, r.status_code, r.headers)

print(request_line)

html = open('test.html', 'wb')
html.write(r.text.encode('utf-8'))
html.close()	

f=open("test_results.txt", "a+")
f.write(request_line)
f.close()

4.1 Perl


# Page Grab
use strict;
use WWW::Mechanize;
use Try::Tiny;

my $uid = "user";
my $pwd = "pass";
my $ip_address = "headnode_address";
my $port = headnode_port;
my $page = 0;

my $url = "http://exampleurl.com";

{

	my $mech = WWW::Mechanize->new;
	my $prox = "http://" . $uid . ":" . $pwd . "\@" . $ip_address . ":" . $port;
	$mech->proxy('http', $prox);


  try   {
   
          if ($mech->get($url)->is_success())
              {
                print $ip_address . " ---- Status =_" . $mech->status() . "\n";
                $page++;
                open(OUTF, ">", "file" . $page  . ".htm");
                print OUTF $mech->content();
                close(OUTF);              
              }
        }


  catch {
   				print $ip_address . " ---- Status =_" . $mech->status() . "\n";
                $page++;
                open(OUTF, ">", "file" . $page  . ".htm");
                print OUTF $mech->content();
                close(OUTF);  
        }


}