Posts Tagged: ruby


31
Oct 11

Distributed locking made easy

There are various situations when one would use a simple mutex to ensure mutual exclusion to a shared resource. This is usually very simple to accomplish using your favorite language library, but that constrains you to a single process mutual exclusion. Even single machine mutual exclusion is rather straight forward, usually just locking a resource (i.e. file) and awaiting for the lock to be released. You can use that for an IPC mutex.

But what if one needs a distributed mutex to allow mutual exclusion amongst distributed clients? At that, the mutex has to offer various guarantees, as is with any shared state. We know shared state is hard to reason about and provokes a lot of bug-prone software, shared distributed state, is much harder, requiring distributed guaranteed consensus all while operating in a non-reliable network environment. There are various distributed consensus algorithms, Paxos being one of the more widely used ones.

But you can deploy your own distributed locking service, without having to implement your own. Apache Zookeeper, offers distributed synchronization and group services. Mutex/locking is just one of the things you can do with Zookeeper. Zookeeper is rather low level, so implementing a distributed lock, although trivial, requires some boilerplate.

Recently we needed a distributed lock service, to ensure only one person in our organization is performing a particular systems activity at any given point and time. We implemented it on top of a homegrown tool written in ruby. The example below is in ruby, though the api calls would translate to any language…

The usage of this Lock class is such:

    Lock.new("localhost:2181").with_lock(
      "/myapp",
      5, ## Timeout in seconds
      lambda { ## Timeout callback
        abort("Couldn't acquire lock.  Timeout.") },
      lambda { ## Do what ever you want here  }
    )

The details of the algorithm are outlined here.

Of course, before you use it, you must install Zookeeper and create the root path /myapp in order to be able to use it.

Also, please note, I have removed the access control part from the example. In order to use this in production, I strongly encourage you read this.


13
Dec 10

Rails custom validation before ActiveRecord typecasting

Rails 3 validation framework, extracted from ActiveRecord and now a part of ActiveModel is pretty sleek. As anything else rails (consider it good or bad) it offers sensible defaults. Most stuff in rails is also easily configurable/customizable. Validations is one such thing, but I think it needs more documentation.

I ran into a problem trying to validate a date field (defined in ActiveRecord) as:

Rails also doesn’t have a built-in date validation, but it’s easy enough to either provide your own validator method or create a reusable validator. I opted for the later, as I might need to reuse the same validator across the site.

So running this with any invalid date value, like say 11111111111 which gets parsed by Date.parse method didn’t yield an error, actually it never even invoked the validator.

After a bit of digging, I figured out that it has to do with the lifecycle of ActiveRecord, basically it tries to typecast the value to the value defined in migrations. Typecasting invalid values like above causes ActiveRecord to set the value to nil (not sure why yet, will dig into source code later). Because the validator isn’t invoked on nil values, unless you specifically tell it to not allow nil with :allow_nil => false, the validate_each method is bypassed. You can see the logic for yourself in the validate method.

In order to fix this issue, you have to access the raw value that’s kept around after the typecast. I turned this into a reusable class, though if the validator that needs to operate on the raw (uncasted) value, you just inherit from this class. Here is the class…

Now the DataValidator above just need to inherit from RawEachValidator and you’re all set.


25
Nov 09

Merge sort implementation and performance in Scala and Ruby

I’m not trying to turn this into language A vs. B debate, its just that something interesting happened last night. I’m trying to learn both Scala and Ruby. I’m a bit more enthused by Scala at this point, mostly because I somewhat prefer static typing and I’m a long time Java guy. Now, I am also a long time Perl guy, so I’m not necessarily choosing Scala over Ruby because of lack of experience with dynamic languages.

So, with this said, the best way to learn language idioms is to start actually implementing something. A sample application is good, but due to frameworks and relatively low business logic/algorithm ration to these sample apps, at least the ones I can conceive, you mostly spend time learning the framework idioms vs. actual languages ones.

So I thought I’d start by implementing some algorithms in these languages. I decided on Merge Sort. It’s a O(n log n) divide and conquer algorithm. See the link for more info.

Let’s start off with the code. Again, this is not done in an OO and/or reusable manner, this is mostly procedural/functional code to test mostly functional idioms.

Scala code:

  import java.math._
  import scala.util._
  import scala.collection.mutable._
  object MergeSort extends Application  {

    println("Starting...");

    val list = initListWith(1000)

    val start = System.currentTimeMillis()
    println("Sorted " + merge_sort(list.toList).length + " elements.")
    val end = System.currentTimeMillis()
    println("Total time: " + (end - start))

    def merge_sort(list:List[Int]): List[Int] = {
      if (list.length <= 1) {return list}
      var (left: List[Int], right: List[Int]) = list splitAt Math.round(list.length/2.0).toInt
      left = merge_sort(left); right = merge_sort(right)
      if(left.last > right.head)
        merge(left, right);
      else
        left ::: right
    }

    def merge(l:List[Int], r:List[Int]):List[Int] = {
      var result:ListBuffer[Int] = new ListBuffer()
      var left = l; var right = r;
      while(left.length > 0 && right.length > 0) {
        if(left.head <= right.head) {
          result += left.head
          left = left.tail
        } else {
          result += right.head
          right = right.tail
        }
      }
      if(left.length > 0)
        result.toList ::: left
      else
        result.toList ::: right
    }

    def initListWith(limit:Int):List[Int] = {
      val list:ListBuffer[Int] = new ListBuffer()
      val randGen = new Random()
      (0 until limit).foreach( i => list += randGen.nextInt(1000000) )
      return list.toList
    }

  }

And Ruby code:

  def merge_sort(*list)
    return list if list.size <= 1

    result = []
    middle = (list.size/2.0).round
    left = list[0, middle]
    right =  list[middle..-1]

    left = merge_sort(*left)
    right = merge_sort(*right)

    if left[-1] > right[0]
      result = merge(left, right);
    else
      result = left + right
    end
    return result
  end

  def merge(left, right)
    result = Array.new()
    while left.size > 0 && right.size > 0
      if left.first <= right.first
        result << left.slice!(0)
      else
        result << right.slice!(0)
      end
    end

    if left.size > 0
      result.concat(left)
    else
      result.concat(right)
    end
    return result
  end

  list = (1..1000).collect {|i| rand(1000000); }

  puts "Starting..."
  start = (1000 * Time.now.to_f).to_i
  puts "Sorted " + merge_sort(*list).size.to_s + " elements."
  end_time = (1000 * Time.now.to_f).to_i
  total_time = end_time - start
  puts "Total time: #{total_time}"

So all works great, they both perform the operation correctly. But here is the weird part. JVM bytecode (Scala is very similar to Java) is supposed to be super fast. That’s the assumption I’ve always made on various readings as well as empirical data. Now, check out these benchmarks…

$ scalac merge_sort.scala && scala MergeSort Starting... Sorted 10000 elements. Total time: 374 $ ruby merge_sort.rb Starting... Sorted 10000 elements. Total time: 138

I’m not timing the compilation/startup phase, as you can see from the code. In the case of 10K records, ruby performs almost 3 times as fast. It the case of 100K records:

$ scalac merge_sort.scala && scala MergeSort Starting... Sorted 100000 elements. Total time: 31213 $ ruby merge_sort.rb Starting... Sorted 100000 elements. Total time: 7681

Ruby is 4 times faster than Scala in this case, of course the growth is relative to the (n log n) performance.

I’m not focusing on Ruby here, since Scala is what I’m concerned about. I used a List for most operations that didn’t require in place modifications. I used a ListBuffer for mutable lists, which advertises constant time append/prepend operations. I’m not sure what I’m missing. I’m going to play with it a bit more to see where the culprit is, but again, I’m not a Scala expert, though not very familiar with its collections library and the ins/outs of each implementation.

Any ideas?

** Update **

So after Maxim’s comment, I updated Scala to use

while(left.lengthCompare(0) > 0 && right.lengthCompare(0) > 0)

vs.

while(left.length > 0 && right.length > 0)

All I can say, Scala #$@%^ing rocks. So it seems like the length calculation on a mutable collection is the culprit, which makes perfect sense.

Using lengthCompare(l:Int):Int gets rid of the brute force length calculation. Here is what I get from the docs…

Result of comparing length with operand l. This method is used by matching streams against right-ignoring (…,_*) patterns.

I’m going to dig into the source to figure out how it does this comparison in a bit, but I can only guess that if say, we’re comparing it to length 0, all it needs is to get to the first element in the collection to figure out that the result is false, no need to actually get the full length.

Did I say culprit, well, let me say, BIG culrpit. Here are the benchmark results now…

Scala:

$ scalac merge_sort.scala && scala MergeSort Starting... Sorted 100000 elements. Total time: 387

Ruby:

$ ruby merge_sort.rb Starting... Sorted 100000 elements. Total time: 7667

Wow, what a drastic improvement.

*** Disclaimer: This means nothing in terms of benchmarking! Please don’t waste your time disqualifying this benchmark, because I know it really means nothing in this small context as well as it means nothing without _power of a test measurement. Benchmarks are crap unless properly and fairly compared. This post is not about benchmarking languages, rather I wanted to see how the various idioms and libraries perform while learning both languages. It turned out to be very useful, as a small recommended tuning in Scala’s collections lib completely turned things inside out_ ***

Any ruby folks out there care to comment?


30
Jul 09

Choosing a web development framework/toolkit

I’m sure I’m not the only lunatic that spends many hours well into the night thinking about web frameworks, but then again, maybe I am. This is all exacerbated by the fact that I work for startups, so requirements are much different that say someone working for an established corporation that has various standards and practices in place. I left the corporate world 4 years ago and haven’t looked back. I love the dynamics of the startup environments and my personality fits very well with its culture and pace.

So some of the questions I battle with are, which framework should I use for this new project, or am I using the right framework for my current project? Is the framework and language it’s written in supports writing applications in a powerful, flexible, fast, scalable way? A lot of the criteria I just listed are not as much framework as design and architecture of your applications and infrastructure, but frameworks can make it easier or harder to achieve such a desirable architecture.

The issue is not as pronounced for other non-web applications, mostly due to the fact that most Turing-complete languages are capable of performing the same job as any other, the only question is the programmers preference, proficiency, and the availability of some framework/abstraction to make your life a bit easier. In some languages, writing these abstractions is a breeze or in some instances they are not explicitly available because some or lost of boilerplate is reduced through various language idioms.

But web development is so complex these days, that simple abstractions are not enough. Anyone that thinks either hasn’t created a serious web application or posses some information that I’d be willing to pay to have:-) Sure, with the knowledge of HTTP and some gateway protocol, whether its CGI, Java Servlets, WSGI, etc…, one can do almost anything that’s possible on the web, but that’s a pretty bad criteria to have in the age of ever so complex applications/features. One doesn’t want to rewrite something from scratch. Authentications/Authorization for example, although many applications have a pretty custom authn/authz scheme, 80%+ of what’s needed is boilerplate that I nor any experienced application developer cares to reinvent. I’d rather be doing more challenging things, not sure about you.

So many frameworks conceal some amount of boilerplate from the developer through abstractions. Right now, there seems to be two kinds of web framework camp schools of thought:

  1. I’ve done this many times before, trust me, you don’t need anything else. I’ve extracted this, not made this up. The 20 applications that I’ve developed with this framework and extracted all of its generic concepts is all you’ll ever need. Here is my convention down your throat. You think you need XYZ? No dumb ass, you just have no clue and your brain has all the baggage of a previous framework. Come on, open your mind, do you really need XYZ? You do? Well #$%@ you, go use someone else’s framework.

  2. We don’t know what the developer wants, he might want bar or foo or barfoo or foobar or foobarbarfoobarbar, or what ever they wake up and desire that day. We’ll come up with abstractions that can be extended by other abstractions. But wait, what if the developer wants to extend those other abstractions, oh, well we must allow them to do so, so here are some more abstractions. Before you get started, here are 50 different things you must do, with xml or code bootstrapping.

The second camps sound nicer, more sane in some instances, and you’re overwhelmed with the flexibility that makes you believe that nothing is impossible. But that’s further from the truth.

I’ve mostly used the #2 frameworks, as with many years of development, I’ve developed quite a convention of my own. No, it’s not that I’m not open minded old timer that is scared of change, actually I love change so much that I find spending many unproductive nights hacking something in a completely different language/frameworks, exploring the ever so unpopular technologies, etc… But my conventions have grown empirically, though I’m not easily swayed to go back 10 years ago when I had no experience and relearn from the same mistakes I’ve already made to just come to most likely the same conclusions. People do this all the time, in every field and I don’t have grant money nor a big corporation blindly investing in my useless use of time.

I’m pretty big on DDD and OO and all the abstractions that come with it and I need a framework that allows me to do so, without forcing me to mix relational and OO concepts by forcing me to use a weak ORM, or no ORM, or an ORM that they choose for me. I also am more than capable of deciding whether I need a Repository data layer and not just a plain simple DAO layer. I know what I’m talking about, at least I think I do, so I don’t need any 20 year old telling me that using recordsets that masquerade as domain objects is just fine. Maybe for you, but I have a different opinion. It’s possible that your todo list will do just fine, but not my software that might start off with 10 domain objects concepts and grow larger as functionality is added. Ok, enough with my discontent with the 20 year old programmers, there are many of them that are brilliant, they’ll just have to learn (or not) as the time goes on.

Now with all that’s mentioned above, I also am very aware of over-engineering and aren’t we all so good at it. So, when I’m not in the mood to over-engineer something, I’d like a simple way of accomplishing a task. Without all the enterprise application pattern abstractions, etc… Sometimes I just want to create a quick prototype, start off simple, then grow it if needed into a production ready piece of software. Eventually through constant refactoring, I’ll add the necessary patterns/abstractions as my requirements grow or change.

With all the above rant, I’m yet to find a framework that can do this. Rails maybe comes close, only one problem. ActiveRecord is very limited and sucks. Besides the mapping limitations, you also bind your domain objects to the relational model and are constrained into modeling your domain in a unnatural way. This might be fine if you’re building a small app in its isolation, but it’s a huge technical debt if you’re building a service model on top of a persistence model that might be used/reused by other domains or services, etc… I want to have a consistent, coherent domain which is available as a service to other services, like webapps, background processes, etc… Good luck doing this with Rails. It is true that when I’m first starting my app this might not be a requirement, but like any other startup company, we have a vision and if that vision is realized, we’d rather not have that much technical debt to pay before we can move forward.

So after all the rant above, what is it that I want? Here is a list. It’s not comprehensive, just things I can think of at this time. I’ll update it as I think of anything else.

  1. Support for MVC (most frameworks have pretty decent support for that now)
  2. Extensible MVC (need to be able to extend the way controllers and views operate. Some frameworks do it by convention and limit you to a set of popular conventions.)
  3. Allows you to build your domain in isolation. (I want my domain model to be completely decoupled from any web technologies, persistence, etc… Just a plain OO domain model)
  4. Gives you very flexible persistence options. (I might decide to use a fully featured ORM (ActiveRecord is not fully featured), or I might want to use SQL, or heck, I might want to do both for efficiency or to scratch a morning itch, who cares about the reason, please let me choose. Oh, and one more thing, what if I don’t want to use a SQL database at all? I want to use a native XML store or better yet a key/value store. Even if it’s just to piss someone off, I want to do this and one should be able to accomplish that pretty easily. I’m not asking for a mapper for these stores, simply just don’t make your framework bound to some relational store though making the work of turning this dependency off a 5 hour chore.)
  5. Supports AJAX (I should be able to easily render JSON or XML views, without much plumbing or lots of mappings and annotations. The authentications and forms support should also expose some form of ajax compliant interfaces, so that forms and authentication can be done using ajax if I choose so. Be able to easily parse submitted data into some data structure and validate/synchronize it with the domain model.)
  6. Bindings (All frameworks have some sort of bindings. Some of them are limiting. I don’t want to create command objects to just simply bind the data and than synch with my domain model. If I have domain object graph(s), I should be able to bind it in the view layer. Bindings should be customizable. In Spring MVC for example, you can only bind one input control to a field or set of fields in the command object, but what if I want to bind 3 input fields that collectively represent one field in the domain object, I’m out of luck, unless I use javascript to first serialize those input fields into one field. That really sucks.)
  7. Support RESTful, stateless, and other web concepts in a straightforward way. (I want to be able to configure every part of HTTP and the web and make the application work, look, interact in my way that’s compatible with the web, not your way. Some component based frameworks make that harder that it should be, like the fact that they are inherently stateful by default. Some make it hard to support RESTful or custom URI schemes, because they transfer state through URL rewriting. All of these problems don’t exist in some frameworks, like Rails, Spring MVC, Grails, etc…, so I know it’s possible.)
  8. Validations (Most have fully fledged validation support, but I can’t say that it can’t be made a bit easier. I do like Spring’s flexible validation support.)
  9. Forms (This is a big one. Can you provide a flexible way of creating forms and layouts. I mean seriously, we’re developing forms today the same way we’ve developed forms 15 years ago. Every other aspect of development has moved on, but we’re still doing bullshit html form controls. XForms is a way out, but no browser support and pretty hard to integrate support from vendors like Orbeon and Chiba makes the standard useless. Can we either embrace it or come up with something else. Am I the only one that gets an anxiety attack every time I think about creating yet another interactive form that doesn’t do anything much differently than the form I created 4 months ago for a different project, though I either have to copy and paste all the cruft or start from the absolute scratch. Wow, that’s sad IMO.)
  10. Scalability. I know this one again is not up to the framework, but as I mentioned before, the framework can make it easier or harder to achieve. For example, inherently stateful frameworks that require either session affinity or replication of session state, make it very hard to horizontally scale. Yes, I know you can scale with replications tools available out there, but any synchronous replications is not linearly scalable. So any such frameworks makes it harder. There are many other criteria that can make a framework more scalable than others, but in general, statelessness, stability, and speed makes it viable for faster scalability tunes.)

Ok, that’s it for now, I need to vent before my brain allows me to think of other things I’ve encountered of my never ending framework journey. I’d really love for someone to just say, hey you’re wrong, there is such a tool(s), here it is. I’d be eternally grateful. Many will say that it’s useless to complain, if I see a need, help create the framework or functionality that you think is needed. I wish I had more time, until then, I’ll continue to grunt and develop my own inner frameworks to make things easier. One day, if I have time, maybe I’ll devote some time into making some existing open source framework better. I’ve had more time years ago and contributed to quite a few open source projects that I’m truly starting to miss it now. I still occasionally submit a patch or two to a framework I’m working on after fixing an issue or adding some feature, but at times I’m in such a hurry to move on to the next task, I don’t have the time to package or generalize it enough to make it useful for everyone else.

Right now, I’m working with Grails after starting a project in Spring MVC and not being able to deliver functionality as quickly as I wanted. I’ll have to live with some issues I found with Grails when I was using it about 4 months ago, like the fact that you must use hibernate or gorm, crappy groovy stack traces, etc… Hey, there is always something one might not like, but I really like Grails and am hoping that now that it’s in the hands of SpringSource, they’ll spruce up the documentation to be more like Spring’s awesome documentation and clean it up a bit.

Update: I wanted to reiterate a bit on the Grails in regards to isolated domain model. Grails does allow you to create and isolate your domain model and its persistence, unfortunately you have to twist it’s arm if it’s anything outside of GORM. You don’t have to add classes to domain directory, but wtf is it there for? Also, it would be nice if the grails team provided a way to specify which classes in the domain directory are persistent or not. I mean, a domain model != persistent entities. So transient classes and other domain artifacts should also be grouped together. Putting them into src/groovy sucks personally, because I have to navigate two directories now to look at what’s supposed to be a coherent domain model.