Tuesday, January 31, 2006

Tuning Apache, part 1

There was a link on Digg a couple of days ago to an article about how to tune Apache so as to survive a Slashdotting. After reading it through, I came to the conclusion that the author had no idea what he was talking about. Not only did he admit that he had never experienced the "Slashdot Effect", but his advice was just plain wrong. I offered a few comments there, but I figured that I should elaborate on a few of them here. I'll post each major configuration topic as a new blog entry, and today's entry is about HTTP's Keep-Alive feature.

A brief history of Keep-Alives
The original HTTP protocol did not allow keep-alives, which meant that a connection was made to the server for each file that needed to be downloaded. This was a very inefficient method of doing things, especially since web pages typically had several files that needed to be downloaded in order to be properly displayed. Why was it inefficient? For two reasons:
  1. Each connection requires an overhead of at least 3 packets to be initiated (SYN, SYN-ACK, and ACK packets). This means that at least three round-trips are required to open a connection, which obviously slowed things down.
  2. Due to the nature of TCP, which underlies HTTP, a connection gets "faster" the longer it is open. By continously opening and closing new connections, HTTP would never be able to fully utilize its available bandwidth.
The designers of HTTP realized this weakness in the protocol, and took steps to correct it in the next version of HTTP. This new version of HTTP incorporated the concepts of keep-alives, where a client could keep a connection to the web server open indefinitely, or at least as long as the server permitted. Although this somewhat went against HTTP's original design goal of being "stateless", it allowed for it to overcome its speed and overhead problems.

A brief introduction to Apache
Now let's examine how Apache works. When you start Apache, a main "coordinator" process is created. This main process is responsible for accepting incoming connections and passing them off to "worker" processes that it creates. These workers then read users' requests and send back responses. Once a worker is done servicing a user's requests, it reports back to the main process and then waits for a new connection to be handed to it.

Apache and Keep-Alives
So, in theory, keep-alives are a great thing. They allow web clients and servers to fully utilize their available bandwidth, and reduces latency by eliminating the overhead of frequently opening new connections. In a perfect world, you would want Apache's KeepAliveTimeout setting to be "infinity", so that web clients maintain a connection to the web server for as long as possible and thus everything on your web site pulls up as fast as possible.

Apache allows you to configure its behavior in regard to keep-alives through a few options in its configuration file:
  • KeepAlive: either On or Off, depending on whether Apache should allow connections to be used for multiple requests
  • KeepAliveTimeout: how long, in seconds, Apache will wait after a request has been answered for another request before closing the connection
  • MaxKeepAliveRequests: how many total requests a client can issue across a single connection
  • MaxClients: the total number of worker processes that Apache will allow at any given time
The default Apache configuration file sets KeepAlive to be on, with a KeepAliveTimeout of 15 seconds and MaxKeepAliveRequests of 100. The MaxClients setting is set to 150.

Apache meets its match
Unfortunately, nothing in life is free, not even keep-alives. Each client connection requires Apache to create (or use a waiting) worker process to service its requests. These worker processes can only handle one connection at a time, and each connection will last at least 15 seconds. Apache will create a new worker process for each new connection until it hits its limit of MaxClients at 150. Thus, the cost of a keep-alive is one worker process for the KeepAliveTimeout.

Now imagine what happens when 1,000 web clients try to access your web site at the same moment (e.g. when it first shows up on Slashdot). The first 150 clients will successfully connect to your web server, because Apache will create workers to service their requests. However, those web clients do not immediately leave; after they've downloaded your page, they will hold open their connections for 15 seconds until your server forces their connection to close. The next 850 clients will be unable to access the web server, as all of the available Apache worker processes will be used up, waiting for 15 seconds on the unused connections to the first 150 clients. Some of those 850 clients will queue up and wait for an available Apache process to service their request, but most will give up.

Perhaps some readers are wondering why you wouldn't just increase the MaxClients setting to something high enough to handle your peak load, like 2000 or something. This is a very bad idea; you can increase Apache's MaxClients, but only at your own peril. Because each Apache process consumes a bit of memory, you can only fit a certain number in memory before the web server begins to violently thrash, swapping things between RAM and the hard drive in a futile attempt to make it work. The result is a totally unresponsive server; by increasing MaxClients too high, you will have caused your own demise. I will talk about how to figure out a good value for MaxClients in a future post, but a good rule of thumb might be to divide your total RAM by 5 megabytes. Thus, a server with 512 megabytes of RAM could probably handle a MaxClients setting of 100. This is probably a somewhat conservative estimate, but it should give you a starting point.

A partial solution
So how do you fix the problem, other than by adding many gigabytes of RAM to the server? One easy way to get around this limitation is to either reduce the KeepAliveTimeout to a mere second or two, or else to simply turn KeepAlive off completely. I have found that turning it down to 2 seconds seems to give the client enough time to request all of the files needed for a page without having to open multiple connections, yet allows Apache to terminate the connection soon enough to be able to handle many more clients than usual.

One interesting thing of which to take note is what the major Apache-based web sites allow, in terms of keep-alive timeouts. In my (very brief) experiments, it seems that CNN, Yahoo, craigslist, and Slashdot don't permit keep-alives at all, while the BBC has a very short keep-alive timeout of under 5 seconds. On the other hand, there are several other major Apache-based sites that do use a large keep-alive timeout (Apple, CNET, etc...), but they may have decided that they would prefer to take the performance hit so that they can have the "fastest" web sites as possible.

Of course, this isn't a perfect solution. It would be nice to be able to have both high-performance as well as long-lived client connections. Apache 2.2, from what I understand, includes an experimental module that allows keep-alives to be handled very efficiently. If it turns out to work well, then it could be a near-perfect solution to the problem. It does have its problems (i.e. it seems to require a threaded MPM, which is not recommended if you use PHP), but it could be incredibly useful in some situations.


At 9:24 PM, Anonymous Anonymous said...

An interesting read, thx

At 10:05 PM, Anonymous Anonymous said...

good info. i run a proxy site that just gets pwned during the day so im always looking for ways to increase the performance or keep the server from crashing. thanks.

At 11:55 PM, Anonymous Anonymous said...

Very well written article, and I very much agree. Having run servers with a constant 1000-2000 active connections, I've pulled every trick in the book to increase performance.

You definitely have it right on the money with keep alive timeout of 2 seconds.

In the testing I've done over the years I've found that the extra CPU overhead of initiating that many more connections with keep alive off is no good. Having keep alive on, and at about 2 seconds is appropriate for exactly the reason you point out.

I would however like to counter your mention at the end about a threaded MPM. For the last 1.5+ years I've been running apache 2.0.x with the worker threaded MPM. Due to the fact that its threaded, I've found it to use less ram, and be more efficient, and able to handle far more simultaneous connections thatn without a threaded MPM.
Its true that there have been issues in the past with the worker MPM (and having perl, php, etc. also be threaded). This is no longer an issue. I run FreeBSD servers, and all ports are now thread-safe, and will note that you've compiled a threaded apache, and compile themselves threaded.

A few more pointers for a faster server:
Disable ExtendedStatus unless you're actually debugging. Same goes for mod_info.

HostnameLookups off is good (particularly if you're not running a local DNS server)

Setting Options -Indexes FollowSymLinks
saves a few io reads (apache doesn't have to check if its a symlink, it just goes ahead and traverses)

Finally, setting a more reasonable timeout value than 300 will help save you the ram that a slow modem user is tying up, when you're really getting bombarded. (try 30 sec. or less)

At 6:23 AM, Anonymous Anonymous said...

main issue for alot of sites is

they are dynamic - solved in many cases by batch updates producing static webpages.

not enough bandwidth - do not have a balance between overhead/cpu/mem and output compression enabled this makes a big impact when there is alot of text.

moving of permanent or semi permanent high bandwidth content to a seperate hosted site - images, templates etc

At 6:40 AM, Anonymous Anonymous said...

Thanks, nice article. I look forward to reading the rest of your posts about tuning Apache.

One of the biggest improvements I made to my LAMP server was to farm off all static files (mostly images but also some javascript, CSS files, etc) to a customised version of thttpd (make sure it is a version that supports keep alive).

This can handle thousands of requests per second as a single lightweight process, and frees up Apache for handling the dynamic PHP/MySQL stuff.

At 7:57 AM, Anonymous Anonymous said...

Nice little article, I am trying out some of your tips on a heavy load server I have. Hoping to see any improvement at this point.

At 9:00 AM, Anonymous Anonymous said...

Hey, nice tips, i've tried a few of these tips lately on my webserver, from this great article, i will try reducing my keepalive and timeouts :)

Thanks for the tips.

At 4:32 PM, Blogger Marcos said...

Linux 2.6 have an experimental feature to deal with the slashdot effect caused by the keepalive. It makes processes start smaler, so one could use a bigger number of them.

I just failed to locate it on the configuration of the linux-2.6.12. But it was at the 2.6.9. I don't know if it was dropped or if it is now mandatory.

At 10:54 AM, Anonymous Anonymous said...

Hey, great tips and tricks, this post ( and the upcomming ones ) are great, i've figure part of this out by myself, but i was scared to reduce keepalive thinking the apache fondation would know better than me by setting it to 15 seconds. After reading this article, i will definately drop it on my high bandwidth servers :)

Now all i need to do is find this trackball link :)

At 2:00 PM, Anonymous Anonymous said...

if you set up squid reverse proxy in front of apache you could enable keep-alive without drawbacks with a bit of tuning

At 10:32 PM, Anonymous Anonymous said...

Thanks for this excellent description of the problem. Concise & well written. I look forward to read your next articles.

At 12:37 PM, Anonymous Anonymous said...

Very insightful. Thanks for posting - I look forward to further posts!

At 3:46 PM, Anonymous Anonymous said...

Very well written. I can only think of one small point you missed.

You didn't say anything about StartServers, MinSpareServers, MaxSpareServers, MaxClients and MaxRequestsPerChild.

If your webserver is running other processes these numbers are very importent and will require a lot more study.

But, if your server is intended to only run Apache. A good example is if you have a Java middleware server on another system and Apache is only a frount end, then your StartServers should be a big number, your MinSpareServers should be zero (0)and MaxSpareServers should be the same as your StartSerers.

This starts up all your servers when you start Apache and keeps them ready for your users. It also eats all your memory, but you don't care about other processes. This way you are not starting and killing lots of system services.

Some will still go and some. The MaxRequestsPerClient tells Apache to kill a process after so many connections. This is to fix "bad programming" on the Apache developers part. If a process is leaking memory this will recover it. If you trust your Apache server make this number realy big.

Ok now you have an almost perfectly tunned Apache server.

Who Am I? I run Hertz.com.

At 5:22 PM, Anonymous Anonymous said...

Good article, but this is not so much about tunning apache.
But i have to say - good work.

At 5:24 PM, Anonymous Anonymous said...

Yeah !

Nice, good to know that someone writing about apache tunning. Thanks.

At 2:18 PM, Anonymous Anonymous said...

Thanks for the article. While reading it, I suddenly realized the re-use of TCP connections probably caused the problems I am having with mod_limitipconn. It was quite an interesting read! =)

At 4:01 PM, Anonymous Anonymous said...

Very good article!

At 2:03 PM, Anonymous Anonymous said...

We've had an apache 1.3 box handling about 1700 connections, and the load was so high because of KeepAlive.

Once we turned off KeepAlive, reduced TimeOut from 300 to 50 secs increase spare servers and reduced maxrequestperchild, the server been running beautifully ever since.

I've noticed with Apache performance, it is a miss and hit thing. It all depends on the type of applications running on the server.

At 10:48 PM, Anonymous Anonymous said...

it really hold my interest.waitting your next article.

At 11:32 PM, Blogger Unknown said...

We written article, waiting for your next part (Linux Network Care)

At 3:53 PM, Anonymous Anonymous said...

intresting, so you say that apache still is not able of doing more than once connection per process? and that noone there ever heard of select/poll etc.? then I will never change my distro, likely they are the only one that can have 500 connections open with only 20 processes...

At 2:51 PM, Anonymous Anonymous said...

compliments, very useful

Vincenzo (Rome, IT)

At 3:15 AM, Anonymous Anonymous said...

Excellent Article - was looking for something else thought but got to know some cool stuff - thanks mate

At 6:18 AM, Anonymous Anonymous said...

very nice explanation of keep alive.
many thanks

At 1:43 AM, Blogger kchr said...

Great stuff, thanks for sharing!

At 9:15 AM, Anonymous Anonymous said...

good info and Nice article!
After KeepAlive On, multiple requests will be handled one at a time, or say "Sequentially". If there are 30 pages requested, will this "Sequential processing"(like a queuing process) cause the performance down? On the other hand, without KeepAlive, 30 requests will be handled by 30 connections/threads simultaneously, it could be faster. Has you considered the above situation?

Thx, Q.Xie

At 11:22 AM, Anonymous Anonymous said...

Hi, Great post.

You mentioned you'd be doing other posts on this topic in your first post but I don't see any on your blog. I'd be really interested in hearing your take on the interpretation of tuning max requestes per child and other thread tuning parameters.

At 1:18 AM, Anonymous Anonymous said...

Very well written article. In fact i have found it very useful, by reducing keep Alive Time out to less than 5 seconds.

But what if, i have... suppose, some number of Web servers behind the load balancer? Can we still see a increased up on the performance issue, as with the current scenario. Can we really measure it?

Plz through some light. Again thanks a lot for such a wonderful article :)

At 7:20 PM, Blogger Seun Osewa said...

Use a lightweight server like nginx to serve your images. Better still, place nginx in front of apache and use it as a reverse proxy. It will use HTTP 1.1 to speak to the clients

At 3:02 AM, Anonymous Anonymous said...

Thanks for the wonderful story. Apache is reallya break-through. It makes web-development much more convenient.

At 11:02 AM, Anonymous Anonymous said...

Very hot stuff! Thanks a lot

At 5:12 AM, Anonymous Anonymous said...

thanks for the help


At 12:09 AM, Anonymous Anonymous said...

Thank You for the great lesson,
I'm waiting for part 2 :-)

At 4:38 PM, Anonymous Anonymous said...

thanks. good info for me

At 10:01 AM, Blogger Unknown said...

Hey, great article! You really nailed down how KeepAlives and improve or destroy a server. thanks.

Empulse Group
Empulse Hosting

At 9:51 AM, Blogger GuyReviews said...

Hi to all,

I am facing problem with my current server.
MaxClient limit on my server is 150 only.

but my average online users are more then 250.

Can anybody please tell me? How to increase MaxClient limit?

Current server Configuration:
QTY Hardware Component
1 Seagate \ 146GB:SAS:15K \ ST3146855SS
1 Dell \ 9G Drive Controller - SAS/SATA - budget raid \ SAS 6/iR
2 Intel \ 2.4 GHz \ Xeon 5530 (Quad Core)
6 Generic \ 1024 MB \ DDR3 1333 ECC
1 Dell \ 1333 FSB Dual Xeon \ PowerEdge R710

Installed Software
- CentOS Enterprise Linux - OS ES 5.0
- cPanel, Inc. cPanel STABLE

At 5:13 AM, Blogger Unknown said...

Just read your post.Well written. but I have some confusion hope you can clear those.

the default Apache configurations :

KeepAlive Off
MaxKeepAliveRequests 100
KeepAliveTimeout 15

MinSpareServers 5
MinSpareServers 1
MaxSpareServers 20
ServerLimit 256
MaxClients 256

Ok now my understanding was :
Initially Server will create 5 child process and after that it will go up to 256 .
Now, if 256 clients trying to access my server at a time, it should be all right ( according to apache doc)
but as you said, if 1000 clients trying to access my server at a time when MaxClients settings is 256, rest 744 will be halt for a while. and in error_log file i will see some error like "[error] server reached MaxClients setting, consider raising the MaxClients setting"
so it means Server has reached Maximum child process and also client will see " Server time out " when they are trying to access my server.

so its a problem is not it ??

here is my confusion, You did not say anything about increasing Maximum clients limits if you think server will get hit 1000 concurrent request at a time. but to me its like setting keep alive settings to 2 second can over come this issue..

can it really over come this issue without increasing Max Clients settings to something 1000 ??

Again, I never deal with 1000 concurrent hit at a time before. so hopping you can clear this confusion ..

At 3:21 PM, Blogger icewalker said...

Thanks man.. Nice article.

At 4:36 PM, Blogger Abhinav.Singh said...

Great summary i must say, looking forward to your posts.

However, this post in short also summarize why big giants of web are using custom event based servers, just because the thread size can be handled by avoiding general apache overhead.

Sometime back i wrote about how to use libevent for such cases. http://tinyurl.com/y9hzye6 .

Libevent's v2 also provides a method called which allows the server to hold the incoming request as long as possible.

Combo of apache and a custom server will allow you to handle all kind of load one might get :D

At 1:00 PM, Blogger Andy said...

Thanks for the explianation. I look forward to part 2...


Post a Comment

<< Home