Untwisting The Event Loop

0

Have you ever wondered why your Rails application is so memory hungry when it is not even really trying to fully utilize your CPUs? To saturate your CPUs, you have to have a large number of Thin (or Mongrel or whatever) instances. Why is that? We all know that the Ruby interpreter is  unable to utilize more than one CPU (or no more than one CPU at a time in the case of 1.9), but why can't Ruby (or maybe it's Rails?) utilize the processors efficiently? Let's look for an answer to this question.

First off, what happens in a typical Rails action? The Rails framework will be doing some request mapping and routing which is CPU bound (if we consider memory latency negligible). Then a few requests will be sent to the database to retrieve some data, after which a rendering process, which is CPU bound as well, takes place..

def show
 @user = User.find(params[id]) #db access
 @events = Events.find(:all) #another db access
 render :action => :show #rendering
end

The problem here comes with the database part of the action. Calls to the database will block processing till results get back from the DBMS. During that time, Rails will be frozen and will not try to do any thing else till the call ends. The good news is that threads can help here (even Ruby's green threads). A blocked thread will give way to other threads till it is back in the ready state, thus filling those slots with some useful processing. Sounds good enough?

NO!

Sadly Rails is NOT thread safe; you cannot use threads to do parallel processing in Rails. "So why not use something like Merb?" I hear you say. Well, Merb and threads will be able to interleave CPU operations and help with the time spent on IO in something like fetching data from some other service, but it won't save you when you do database IO. This is simply because of the fact that calling C extensions blocks the whole Ruby interpreter. Yes, you read it correctly the first time. Nothing can be scheduled while a native call is being issued. Since database drivers are mostly C extensions, they suffer from this. Your nice SELECT statement keeps the whole Ruby interpreter on hold till it is finished.

But there must be a solution to this. We cannot be all left high and dry with interpreters eating our memory and not really using our CPUs.

Enter EventMachine and AsyMy

For those who are not in the loop of events (bun intended) there happens to be another approach to this problem: Event based (read asynchronous) IO. In this mode of operation, you request an IO operation and tell the event loop what to do when the request is fulfilled (either fully or partially). An excellent library for event handling exists for Ruby, which is Francis' EventMachine (used internally by the Thin server and the evented flavour of Mongrel). But still, using EventMachine does not magically solve all our problems. The question that keeps popping up is "what to do with database access?". AsyMy to the rescue! AsyMy, written by Thomas Ptacek, is an evented driver for MySQL that operates in an asynchronous fashion. A quick example will look like:

connection.execute('SELECT * from events') do |headers,data|
 # do something with headers and data
 pp headers
 pp data
end

Asymy is still in a very early stage, the performance is horrible (as it is based on the extremely slow pure Ruby MySQL driver) and it comes with many rough corners (I was not able to run INSERTs and UPDATEs without hacking it, and I am still not able to run the callbacks for those). Nevertheless, this is a formidable achievement on the road to a very fast single threaded implementation.

Here's how our action would look like if there was an Asymy adapter for ActiveRecord:

#this is propably wrong but it can illustrate
#the twisted nature of evented programming
def show
 User.find(params[:id]) do |result_set|
     @user = result_set
     Events.find(:all) do |result_set|
         @events = result_set
         @events.each do |event|
             event.owner = @user
             if event != @events.last
                 event.save
             else
                 event.save do |ev|
                     render :action => :show
                 end
             end
         end
     end
 end
end

We had to twist the function flow to be able to make use of the evented nature of the new driver. Instead of flow passing normally, it is being scattered in the different callbacks. This is one of the areas where event based programming makes you change the way you think about program flow. A hurdle for many developers and a show stopper for some, no wonder the event library for Python is called Twisted.

Why not untangle this with Fibers?

Fibers are lightweight concurrency primitives introduced in Ruby 1.9. How light weight? well, they don't come at zero cost, but in long running requests the weight they add can be negligible. Fibers provide some form of cooperative (rather than preemptive) concurrency inside a single thread (you cannot pass fibers between threads; you have been warned). Fibers enjoy the ability to pause and resume like continuations, but they don't suffer from the memory leaks the continuations have. When we use this feature wisely we can unwind the action code above to look like this:

def show
 @user = User.find(params[:id]
 @events = Events.find(:all).each do |event|
     event.owner = @user
     event.save
 end
end

Huh? This is the normal action code we are used to. Well, using fibers we can do this and still do things under the hood in an evented way.

To make things clear, we need to illustrate Fibers with an example:

require 'fiber'

fiber = Fiber.new do
 #do something
 Fiber.yield another_thing
 #do yet another thing
end

yielded = fiber.resume # => runs the fiber till the yield,
                    # returns the yielded value
                    # and pauses the fiber where it is
fiber.resume #=> re-runs the fiber from the point it was paused.
fiber.resume #=> no more statements to run, raises an exception

Let's see how can this be useful for dispatching controller actions (this code will preferrably be in the server itself):

Fiber.new do
 Dispatcher.dispatch(controller,action,req,res)
 send_response res
end.resume

Inside the action we call the find method repeatedly. This method could be implemented like this:

class DataStore
 def find(*args)
     query = construct_query(*args)
     fiber = Fiber.current # grab the current fiber
     conn.execute(query) do |headers, data|
         fiber.resume convert_to_objects(data)
     end
     yield
 end
end

This way, whenever the code passes a find method, it will pass the query to the db driver, return immediately and pause, giving room for other requests to be processed. Once the data comes from the db server, the call back is run and it resumes the fiber (passing to it the result of the query). The result gets passed back to the caller of the function and the original action method continues till completion (or till it is paused again by another find method).

Charles Jolley implemented a similar thing here. It is called Pipelined and, while it is more obtrusive than the approach described above, it still has the advantage of being optional. Pipelined uses continuations and hence is available to Ruby 1.8 (and Rails).

I am still ironing out and tying things together (and doing lots of benchmarks) and I would like to tell you that I have ditched AsyMy for now for another alternative which I will attempt to discuss in detail in another blog post.

Written By:

Muhammad A. Ali (oldmoe.blogspot.com)

Post a Comment

eSpace podcast Prodcast

RSS iTunes