31
Oct 11

Distributed locking made easy

There are various situations when one would use a simple mutex to ensure mutual exclusion to a shared resource. This is usually very simple to accomplish using your favorite language library, but that constrains you to a single process mutual exclusion. Even single machine mutual exclusion is rather straight forward, usually just locking a resource (i.e. file) and awaiting for the lock to be released. You can use that for an IPC mutex.

But what if one needs a distributed mutex to allow mutual exclusion amongst distributed clients? At that, the mutex has to offer various guarantees, as is with any shared state. We know shared state is hard to reason about and provokes a lot of bug-prone software, shared distributed state, is much harder, requiring distributed guaranteed consensus all while operating in a non-reliable network environment. There are various distributed consensus algorithms, Paxos being one of the more widely used ones.

But you can deploy your own distributed locking service, without having to implement your own. Apache Zookeeper, offers distributed synchronization and group services. Mutex/locking is just one of the things you can do with Zookeeper. Zookeeper is rather low level, so implementing a distributed lock, although trivial, requires some boilerplate.

Recently we needed a distributed lock service, to ensure only one person in our organization is performing a particular systems activity at any given point and time. We implemented it on top of a homegrown tool written in ruby. The example below is in ruby, though the api calls would translate to any language…

require 'zookeeper'

class Lock
  def initialize(host, root="/my-app")
    @zk = Zookeeper.new(host)
    @root = root
  end
  
  def with_lock(app, timeout, timeout_callback, &block)
    new_lock_res = @zk.create(:path => "#{@root}/#{app}-", :sequence => true, :ephemeral => true)
    unique_lock_path = new_lock_res[:path]
    if get_lock(unique_lock_path, timeout)
      yield
      @zk.delete(:path => unique_lock_path)
    else
      timeout_callback.call
    end
  end

  private
  def get_lock(unique_lock_path, timeout)
    lock_key = unique_lock_path.gsub(/^#{Regexp.quote(@root)}\//, '')

    (0..4).each do
      children = @zk.get_children(:path => @root)[:children].sort
      watcher = Zookeeper::WatcherCallback.new {}
      if (children.first == lock_key)
        return true
      else
        less_than_path_idx = (children.index {|p| p == lock_key}) - 1
        stat_res = @zk.stat(:path => "#{@root}/#{children[less_than_path_idx]}",
                           :watcher => watcher)
        if stat_res[:stat].exists
          success = wait_until(timeout) { watcher.completed? }
          if !success
            return false
          end
        end
      end
    end
    return false
  end

  def wait_until(timeout=10, &block)
    time_to_stop = Time.now + timeout
    until yield do
      if Time.now > time_to_stop
        return false
      end
      sleep 0.1
    end
    return true
  end
end

The usage of this Lock class is such:

    Lock.new("localhost:2181").with_lock(
      "/myapp",
      5, ## Timeout in seconds
      lambda { ## Timeout callback
        abort("Couldn't acquire lock.  Timeout.") },
      lambda { ## Do what ever you want here  }
    )

The details of the algorithm are outlined here.

Of course, before you use it, you must install Zookeeper and create the root path /myapp in order to be able to use it.

Also, please note, I have removed the access control part from the example. In order to use this in production, I strongly encourage you read this.

Tags: , ,