Thursday, March 24, 2011

Supporting long polling in lighttpd with fastcgi

Long-polling (or comet) is a technique (or hack) designed to support asynchronous server to client messaging. In other words push notification from the browser. Given that the HTTP protocol is a client initiated protocol, there's no provision to support a non-client initiated event from the server to the client. That's where Long-polling comes in. Basically it involves a server that never closes the socket (in other words completes the response to the request), and a client that keeps a socket open (and continues to read from the socket).

This allows the client to continue to receive data from the server. From the HTTP perspective this is as if the response never ever finished. The big win, of course, is that the server is now able to push messages back to the client, creating a mechanism via HTTP that allows for asynchronous message pushing. But given that this is a bit of a hack the pushing of messages back to the client is outside the bounds of the HTTP protocol. Meaning that an "out-of-band" protocol specific to the application needs to be designed for this communication.

So, I've been looking at what it would take for lighttpd plus fastcgi to support this configuration. The good news is that it isn't a whole lot.



Each request that is received by lighttpd and passed through to the fastcgi is handled by a simple fastcgi accept call that blocks in a forever loop waiting for the next event (see code snippet below). After the event (request) is received and processed the response is printed to STDOUT and the fastcgi returns back to the wait loop.

That's all well and good for a single request/response pair, but what would need to change with this configuration to support long-polling? A couple things. For one, the connections needs to be held on to, processing needs to be in place to continue to read and writing of messages on the connection, and finally, the system needs to be set up to be able to handle other incoming requests.

A skeleton of the standard fastcgi processing loop basically looks like the following:

while (FCGI_Accept() >= 0) {

  //do some work....

  //write response to stdout
  
  //complete response with a /r/n/n

  //return back to top of loop and wait for next request
}

The skeleton to support long-polling would be changed to:

while (FCGI_Accept() >= 0) {

  //do some work....

  //write response to stdout

  //complete response with a /r/n/n

  FCGI_Flush(stdout);

  while (_long_polling == true && _messages = true) {
    //push message to client

    FCGI_Flush(stdout);

    //process response from client

    //check for more server messages
  }
  //done handling the long-polling event
  //return back to top of loop and wait for next request
}

As you can see there's an inside loop that holds onto the connection (and assumes that the client is holding on via an AJAX connection or equivalent). That provides the "pipe" for communication back to the client. Note that the FCGI_Flush() command is required otherwise lighttpd will buffer up the messages being sent from the server until the buffer has exceeded some max buffer size or the connection closes. In our case we want the message to be sent when complete, and the only way to do this is to communicate to lighttpd that the buffer needs to be flushed, i.e. sent back to the client.

The one other piece that is required is concurrent access support. In other words long-polling will tie up the fastcgi and lighttpd will block the response until a fastcgi module becomes available.

If you have only one fastcgi module, it means that you will process connections sequentially, and the second request will be blocked until your one lonely fastcgi decides it has finished with the first long-polling request. Which could be a long time--that's why it's called long polling.

Now, there are two ways to set up fastcgi processes within lighttpd. Either let lighttpd do this work for you via the fastcgi.conf file where "max-procs" can be configured. Or manually via spawn-fcgi.standalone. You get it, either let lighttpd manage the processes or you manage the processes.

What I found is that for purposes of handling concurrent access there is no difference which method you use. Lighttpd doesn't start processes based on demand, instead it starts up the "max-procs" configured. Period. Just as you would do if you spawned them yourself. So, given that there is no difference my preference is to manage the processes myself, and add a global count of the number of long-polling active connections to ensure that there's always one connection available.

No comments:

Post a Comment