November, 2009


26
Nov 09

Apache Ivy resolving local maven artifacts

Just submitted a fix to GRAILS-5327. This bug was really bothering me, mostly because ivy’s API documentation is non-existent. Even the user docs are begging to be enriched with more clarity and content.

Either way, for those of you facing the same issue. The problem is that sometimes you use maven to publish your modules to your local repository usually in ~/.m2/repository directory. Ivy has a FileSystemResolver which allows one to resolve to the file system using the maven2 patterns. It can be set up very easily, but the problem I ran into is that although ivy resolves the local dependencies, it does nothing about its transitive dependencies. WTF? Why would I even want to use ivy then. So I went looking, online through posting, bad documentation, and javadocs to no avail. There was one posting which specified that one should use the pom ending vs. [ext]. So my previous artifact pattern…

${repoPath}/[organisation]/[module]/[revision]/[module]-[revision](-[classifier]).[ext]

should be changed to

${repoPath}/[organisation]/[module]/[revision]/[module]-[revision](-[classifier]).pom

I did this, but still not go. After trying a few combinations, I finally figured it out. Notice, the above artifact pattern is bolded. This is because ivy also has an ivy pattern. Because I’m not really interested in diving into ivy’s source (I seldom use it) and docs suck, I just took a dumb down brute force approach, let’s try a few combination and see how ivy behaves. Well, one of them worked. Basically, you must set both ivy and artifact patterns…

localFileSystemMavenResolver.addIvyPattern("${repoPath}/[organisation]/[module]/[revision]/[module]-[revision](-[classifier]).pom")
localFileSystemMavenResolver.addArtifactPattern("${repoPath}/[organisation]/[module]/[revision]/[module]-[revision](-[classifier]).[ext]")

Why? I have no freaking idea. I’m not even going to guess, since even the posts by the core ivy team on the post linked above, seem to be stabbing in the dark. “try this…” they say. Well, it works. Maybe some day, I’ll have the interest and the time to look at ivy source, for now, I’m happy with using maven for most java projects and Grails’ new dependency management (using ivy for dep resolution) for my Grails projects.

Hope this helps anyone battling ivy. I would think a dependency resolution library that claims it can do for ant what maven does with dependency management, would at least properly and transitively resolve maven dependencies without too much poking around. I mean, are there any viable non-maven repositories out there? I really don’t think so.


25
Nov 09

Merge sort implementation and performance in Scala and Ruby

I’m not trying to turn this into language A vs. B debate, its just that something interesting happened last night. I’m trying to learn both Scala and Ruby. I’m a bit more enthused by Scala at this point, mostly because I somewhat prefer static typing and I’m a long time Java guy. Now, I am also a long time Perl guy, so I’m not necessarily choosing Scala over Ruby because of lack of experience with dynamic languages.

So, with this said, the best way to learn language idioms is to start actually implementing something. A sample application is good, but due to frameworks and relatively low business logic/algorithm ration to these sample apps, at least the ones I can conceive, you mostly spend time learning the framework idioms vs. actual languages ones.

So I thought I’d start by implementing some algorithms in these languages. I decided on Merge Sort. It’s a O(n log n) divide and conquer algorithm. See the link for more info.

Let’s start off with the code. Again, this is not done in an OO and/or reusable manner, this is mostly procedural/functional code to test mostly functional idioms.

Scala code:

  import java.math._
  import scala.util._
  import scala.collection.mutable._
  object MergeSort extends Application  {

    println("Starting...");

    val list = initListWith(1000)

    val start = System.currentTimeMillis()
    println("Sorted " + merge_sort(list.toList).length + " elements.")
    val end = System.currentTimeMillis()
    println("Total time: " + (end - start))

    def merge_sort(list:List[Int]): List[Int] = {
      if (list.length <= 1) {return list}
      var (left: List[Int], right: List[Int]) = list splitAt Math.round(list.length/2.0).toInt
      left = merge_sort(left); right = merge_sort(right)
      if(left.last > right.head)
        merge(left, right);
      else
        left ::: right
    }

    def merge(l:List[Int], r:List[Int]):List[Int] = {
      var result:ListBuffer[Int] = new ListBuffer()
      var left = l; var right = r;
      while(left.length > 0 && right.length > 0) {
        if(left.head <= right.head) {
          result += left.head
          left = left.tail
        } else {
          result += right.head
          right = right.tail
        }
      }
      if(left.length > 0)
        result.toList ::: left
      else
        result.toList ::: right
    }

    def initListWith(limit:Int):List[Int] = {
      val list:ListBuffer[Int] = new ListBuffer()
      val randGen = new Random()
      (0 until limit).foreach( i => list += randGen.nextInt(1000000) )
      return list.toList
    }

  }

And Ruby code:

  def merge_sort(*list)
    return list if list.size <= 1

    result = []
    middle = (list.size/2.0).round
    left = list[0, middle]
    right =  list[middle..-1]

    left = merge_sort(*left)
    right = merge_sort(*right)

    if left[-1] > right[0]
      result = merge(left, right);
    else
      result = left + right
    end
    return result
  end

  def merge(left, right)
    result = Array.new()
    while left.size > 0 && right.size > 0
      if left.first <= right.first
        result << left.slice!(0)
      else
        result << right.slice!(0)
      end
    end

    if left.size > 0
      result.concat(left)
    else
      result.concat(right)
    end
    return result
  end

  list = (1..1000).collect {|i| rand(1000000); }

  puts "Starting..."
  start = (1000 * Time.now.to_f).to_i
  puts "Sorted " + merge_sort(*list).size.to_s + " elements."
  end_time = (1000 * Time.now.to_f).to_i
  total_time = end_time - start
  puts "Total time: #{total_time}"

So all works great, they both perform the operation correctly. But here is the weird part. JVM bytecode (Scala is very similar to Java) is supposed to be super fast. That’s the assumption I’ve always made on various readings as well as empirical data. Now, check out these benchmarks…

$ scalac merge_sort.scala && scala MergeSort Starting... Sorted 10000 elements. Total time: 374 $ ruby merge_sort.rb Starting... Sorted 10000 elements. Total time: 138

I’m not timing the compilation/startup phase, as you can see from the code. In the case of 10K records, ruby performs almost 3 times as fast. It the case of 100K records:

$ scalac merge_sort.scala && scala MergeSort Starting... Sorted 100000 elements. Total time: 31213 $ ruby merge_sort.rb Starting... Sorted 100000 elements. Total time: 7681

Ruby is 4 times faster than Scala in this case, of course the growth is relative to the (n log n) performance.

I’m not focusing on Ruby here, since Scala is what I’m concerned about. I used a List for most operations that didn’t require in place modifications. I used a ListBuffer for mutable lists, which advertises constant time append/prepend operations. I’m not sure what I’m missing. I’m going to play with it a bit more to see where the culprit is, but again, I’m not a Scala expert, though not very familiar with its collections library and the ins/outs of each implementation.

Any ideas?

** Update **

So after Maxim’s comment, I updated Scala to use

while(left.lengthCompare(0) > 0 && right.lengthCompare(0) > 0)

vs.

while(left.length > 0 && right.length > 0)

All I can say, Scala #$@%^ing rocks. So it seems like the length calculation on a mutable collection is the culprit, which makes perfect sense.

Using lengthCompare(l:Int):Int gets rid of the brute force length calculation. Here is what I get from the docs…

Result of comparing length with operand l. This method is used by matching streams against right-ignoring (…,_*) patterns.

I’m going to dig into the source to figure out how it does this comparison in a bit, but I can only guess that if say, we’re comparing it to length 0, all it needs is to get to the first element in the collection to figure out that the result is false, no need to actually get the full length.

Did I say culprit, well, let me say, BIG culrpit. Here are the benchmark results now…

Scala:

$ scalac merge_sort.scala && scala MergeSort Starting... Sorted 100000 elements. Total time: 387

Ruby:

$ ruby merge_sort.rb Starting... Sorted 100000 elements. Total time: 7667

Wow, what a drastic improvement.

*** Disclaimer: This means nothing in terms of benchmarking! Please don’t waste your time disqualifying this benchmark, because I know it really means nothing in this small context as well as it means nothing without _power of a test measurement. Benchmarks are crap unless properly and fairly compared. This post is not about benchmarking languages, rather I wanted to see how the various idioms and libraries perform while learning both languages. It turned out to be very useful, as a small recommended tuning in Scala’s collections lib completely turned things inside out_ ***

Any ruby folks out there care to comment?