Scaling and Securing the Content Proxy ASP.NET

As widgets start using the proxy service,described in Chapter 5,more and more, this single component will become the greatest scalability bottleneck of your entire web portal project.It’s not unusual to spend a significant amount of development resources to improve scalability,reliability, availability, and performance on the content proxy.This section describes some of the challenges you will face when going live to millions of users with such a proxy component.

Maintaining Speed

Widgets call a proxy service to fetch content from an external source.The proxy service makes the call,downloads the response on server,and then transmits the response back to the browser.There are two latencies involved here: between the browser and your server,and your server and the destination.If the response’s payload is high,say 500 KB, then there’s almost 1 MB of transfer that takes place duringthe call.So,you need to put a limit on how much data transfer you allow from the proxy (see Example).Http Web Response class has a Content Length property that tells you how much data is being served by the destination. You can check whether it exceeds the maximum limit that you can take in. If wid gets are requesting a large amount of data, it not only slows that specific request, but also other requests on the same server, since the server’s band width is occupied during the mega byte transfer.

Servers generally have 4 Mbps,10 Mbps,or, if you can afford it, 100 Mbps connectivity to the Internet.At 10 Mbps, you can transfer about 1 MB per second. So, if one proxy call is occupied transferring mega bytes,there’s no bandwidth left for other calls to happen and band width cost goes sky high. Moreover, during the large transfer,one precious HTTP worker thread is occupied streaming mega bytes of data over a slow Internet connection to the browser.If a user is using a 56 Kbps ISDN line, a 1 MB transfer will occupy a worker thread for about 150 seconds.

Putting a limit on how much data you will download from external sources via aHttp Web Request

Http Web Response response = request.GetResponse( ) as Http Web Response;

if (response.StatusCode == HttpStatusCode.OK)

{

int maxBytesAllowed = 512 * 1024;// 512 K

if (response.ContentLength > max Bytes Allowed)

{

response.Close( );

throw new ApplicationException("Response too big.

Max bytes allowed to download is: "+ max Bytes Allowed);

}

Sometimes external sources do not generate the content length header, so there’s no way to know how much data you are receiving unless you download the entire byte stream from the server until the server closes the connection. This is a worst-case scenario for a proxy service because you have to download up to your maximum limit and then abort the connection. Example shows a general algorithm for dealing with this problem.

An algorithm for downloading external content safely

Get content length from the response header.
If the content length is present, then
Check if content length is within the maximum limit
If content length exceeds maximum limit, abort
If the content length is not present
And there are more bytes available to download
Read a chunk of bytes, e.g., 512 bytes,
Count the total number of bytes read so far
If the count exceeds the maximum limit, abort

Connection management

Every call to the proxy makes it open an HTTP connection to the destination, download data, and then close it.Setting up an HTTP connection is expensive because there’s network latency involved in establishing a connection between the browser and the server.If you are making frequent calls to the same domain,like Flickr.com, it will help to maintain an HTTP connection pool, just like an ADO.NET connection pool.

You should keep connections open and reuse open connections when you have frequent requests going to the same external server. However,the HTTP connection pool is very complicated to make because, unlike SQL Servers in fast private networks, external servers are on the Internet, loaded with thousands of connection from all over the world, and are grumpy about holding a connection open for long period.They are always eager to close down an inactive connection as soon as possible.So, it becomes quite challenging to keep HTTP connections open to frequently requested servers that are quite busy with other clients.

DNS resolution

DNS resolution is another performance obstacle.If your server is in the U.S.,and a web site that you want to connect to has a DNS in Australia,it’s going to take about 1 second just to resolve the web site’s IP.DNS resolution happens in each and every Http Web Request.There’s no built-in cache in .NET that remembers the host’s IP for some time.

You can benefit from DNS caching if there’s a DNS server in your data center.But that also flushes out the IP in an hour.So, you can benefit from maintaining your own DNS cache. Just a static thread-safe dictionary with the key as the domain name and the value as the IP will do.When you open Http Web Request,instead of using the URI that is passed to you, replace the domain name with the cached IP on the URI and then make the call.But remember to send the original domain as the host header’s value.

The Http Web Request class has some parameters that can be tweaked for performance and scalability for a proxy service.For example, the proxy does not need any keepalive connections. It can close connections as soon as a call is finished.In fact, it must do that or a server will run out of TCP sockets under a heavy load.

A server can handle a maximum of 65,535 TCP connections that connect one a time.However,your application’s limit is smaller than that because there are other applications running on the server that need free TCP sockets.Besides closing a connection as soon as you are finished,you need to set a much lower Timeout value for Http Web Request.The default is 100 seconds,which is too high for a proxy that needs content to be served to a client in seconds. So, if an external service does not respond within 3 to 5 seconds, you can give up on it.

Every second the timeout value increases,the risk of worker threads being jammed is increased as well. Read Write Time out is another property that is used when reading data from the response stream. The default is 300 seconds,which is too high;it should be as low as 1 second. If a Read call on the response stream gets stuck, not only is an open HTTP connection stuck but so is a workerthread on the ASP.NET pool. Moreover, if a response to a Read request takes more than a second, that source is just too slow and you should probably stop sending future requests to that source (see Example).

Optimizing the HttpWebRequest connection for a proxy

ttpWebRequest request = WebRequest.Create("http://... ") as HttpWebRequest;

request.Headers.Add("Accept-Encoding","gzip");

request.AutomaticDecompression = DecompressionMethods.GZip;

request.AllowAutoRedirect = true;

request.MaximumAutomaticRedirections = 1;

request.Timeout = 15000;

request.Expect = string.Empty;

request.KeepAlive = false;

request.ReadWriteTimeout = 1000;

Most of the web servers now support gzip compression on response. Gzipcompression significantly reduces the response size,and you should always use it.To receive a compressed stream,you need to send the Accept-Encoding: gzip header and enable Auto matic De compression.

The header will tell the source to send the compressed response,and the property will direct Http Web Request to decompress the compressed content.Although this will add some overhead to the CPU,it will significantly reduce band width usage and the content’s fetch time from external sources.

For text content, like JSON or XML where there are repeated texts, you will get a 10 to 50 times speed gain while downloading such responses.

Avoiding Proxy Abuse

When someone uses your proxy to anonymously download data from external sources,it’s called proxy abuse.Just like widgets,any malicious agent can download content from external sources via your proxy.Someone can also use your proxy to produce malicious hits on external servers.

For example, a web site can download external content using your proxy instead of downloading it itself,because it knows it will benefit from all the optimization and server-side caching techniques you have done.So, anyone can use your site as their own external content cache server to save on DNS lookup time, benefit from connection pooling to your proxy servers, and bring down your server with additional load.

This is a really hard problem to solve.One easy way is to limit number of connections per minute or day from a specific IP to your proxy service.

Another idea is to check cookies for some secure token that you generate and send to the client side.The client will send back that secure token to the proxy server to identify it as a legitimate user.But that can easily be misused if someone knows how to get the secure token.Putting a limit on the maximum content length is another way to prevent a large amount of data transfer. A combination of all these approaches can save your proxy from being overused by external web sites or malicious clients. However, youstill remain vulnerable to some misuse all the time. You just have to pay for the additional hardware and bandwidth cost that goes into misuse and make sure you always have extra processing power and bandwidth to serve your own need.

Defending Against Denial-of-Service Attacks

The proxy service is the single most vulnerable service on the whole project. It’s so easy to bring down a site by maliciously hitting a proxy that most hackers will just ignore you, because you aren’t worth the challenge.

Here’s one way to bring down any site that has a proxy service:

  1. Create a web page that accepts an HTTP GET call.
  2. Make that page sleep for as long as possible.
  3. Create a small client that will hit the proxy to make requests to that web page.Every call to that web page will make the proxy wait for a long time.
  4. Find the timeout of the proxy and sleep it so that proxy will always time out on each call (this may take some trial and error).
  5. Spawn 100 threads from your small client and make a call to the proxy from each thread to fetch content from that slow page. You will have 100 worker threads stuck on the proxy server.If the server has two processors, it will run out of worker threads and the site will become non responsive.

You can take this one step further by sleeping until timeout minus 1 second.After that sleep, start sending a small number of bytes to the response as slowly as possible. Find the Read Write Time out value of the proxy on the network stream.

This will prevent the proxy from timing out on the connection.When it ’s just about to give up, it will start getting some bytes and not abort the connection.Because it is receiving bytes within the Read Writ Time out, it will not time out on the Read calls. This way, you can make each call to the proxy go on for hundreds of seconds until the ASP.NET request times out.Spawn 100 threads and you will have 100 requests stuck on the server until they time out.This is the worst-case scenario for any web server.

To prevent such attacks,you need to restrict the number of requests allowed from a specific IP per minute,hour, and day.Moreover, you need to decrease the ASP.NET request timeout value on machine.config, e.g., you can set it to 15 seconds so that no request is stuck for more than 15 seconds, including calls to the proxy (see Example).

The machine.config setting for ASP.NET request timeout; set it as low as you can

<system.web>
...
...
<httpRuntime executionTimeout="15/>
...
...
</system.web>

Another way to bog down your server is to produce unique URLs and make your proxy cache those URLs.For example,anyone can make your proxy hit 1 and keep adding some numbers in the query string to make the URL unique. No matter what you add on the query string, it will return the same feed.But because you are using an URL as the key for cache, it will cache the large response returned from MSDN against each key. So,if you hit the proxy with 1 to 1,000 query strings,there will be 1,000 identical copies of the MSDN feed on the ASP.NET cache.

This will put pressure on the server’s memory, and other items from the cache will purge out.As a result, the proxy will start making repeated requests for those lost items and become significantly slower.

One way to prevent this is to set Cache Item Priority as Low for such items in the cache. It will prevent more important items in the cache from purging out. Moreover,you can maintain another dictionary where you store the content’s MD5 hash as key and the URL as value. Before storing an item in the cache,calculate the content’sMD5 hash and check if it’s already in the dictionary. If it is, then this item is already cached, regardless of the URL.So, you can get the original cached URL from the hash dictionary and then use that URL as the key to get the cached content from the ASP.NET cache.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

ASP.NET Topics