04
Jan 10

Heap Sort in Scala

Heap sort is one of the more efficient sorting algorithms, being that it sorts in constant space (not including the memory space of course to store the n elements). Elements are sorted in place, though the constant space O(1). Total space is O(n) of course if you consider the storage of the elements in memory or on disk.

There is lots of literature on heap sorts out there, so I’m not going to go into it in more details. The reason it’s sometimes preferred over say insertion sort or quick sort, is that the worst case (which one always wants to account for, is O(n log n), which is far better than quadratic O(n2) worst case performance of both insertion and quick sort algorithms.

Here is an implementation of quick sort in Scala. Please note, I wrote this with readability in mind, not conciseness. Sometimes they go hand in hand, but at times, as I o’ so learned with Perl over the years, that’s not true in all cases. A good balance between conciseness and readability should be reached and that comes only with experience 🙂

There are only 3 methods that matter here, rest are just auxiliaries (I highlighted the important methods). The code here can be run independently and serves as a benchmark, as you can run it by specifying the size of the initial array (generated randomly) and the # of benchmarks to run. The benchmark is run, best and worst are disposed and the mean of the rest is displayed.


import scala.util._
import scala.collection.mutable._
import scala.util.Random

object HeapSort {

  private val randGen = new Random()

  def main(args: Array[String]) {
    val arraySize = args.first.toInt
    val repeat = args(1).toInt
    var times = perform(repeat, arraySize)
    println("Average: "+averageOf(times)+" ms for "+times.size+" runs.")
  }

  private def heapSort(arr:Array[Int]):Array[Int] = {
    buildHeap(arr)
    (arr.length-1 until 0 by -1).foreach( i => {
      swap(arr, 0, i)
      heapify(arr, 0, i)
    })
    arr
  }

  def buildHeap(arr:Array[Int]) {
    ((arr.length/2.0D).floor.toInt-1 until -1 by -1).foreach( i => heapify(arr, i, arr.length) )
  }

  def heapify(arr:Array[Int], idx:Int, max:Int) {
    val l = left(idx)
    val r = right(idx)
    var largest = if (l < max && arr(l) > arr(idx)) l else idx
    largest = if (r < max && arr(r) > arr(largest)) r else largest
    if (largest != idx) {
      swap(arr, idx, largest)
      heapify(arr, largest, max)
    }
  }

  private def parent(idx:Int):Int = (idx/2.0D).floor.toInt
  private def left(idx:Int):Int = 2*idx+1
  private def right(idx:Int):Int = (2*idx)+2

  def swap(s:Array[Int], i: Int, j: Int):Unit = {
    val v = s(i);
    s(i) = s(j);
    s(j) = v
  }


  private def perform(times:Int, initListSize:Int):ListBuffer[Int] = {
    val benchmarks:ListBuffer[Int] = new ListBuffer[Int]
    (0 until times).foreach(idx => {
      val arr:Array[Int] = initArrayWith(initListSize)
      val start = System.currentTimeMillis()
      heapSort(arr)
      val end = System.currentTimeMillis()
      benchmarks += (end-start).toInt
    });
    benchmarks
  }

  def initArrayWith(limit:Int):Array[Int] = {
    val list:ListBuffer[Int] = new ListBuffer()
    val randGen = new Random()
    (0 until limit).foreach( i => list += randGen.nextInt(1000000) )
    return list.toArray
  }

  def averageOf(benchmarks:ListBuffer[Int]):Long = {
    var sortedBm:Array[Int] = benchmarks.toArray
    heapSort(sortedBm)
    var sum:Int = 0;
    val sumFunc = (t:Int) => sum += t
    // Get rid of best and worst if we ran it more than twice
    if (sortedBm.length > 2)
      sortedBm.slice(1,sortedBm.size-2).foreach(sumFunc)
    else
      sortedBm.foreach(sumFunc)
    return sum/sortedBm.length
  }
}

You can run this code by saving it to HeapSort.scala file and from command line…

scala -i HeapSort.scala -e ‘HeapSort.main(Array(“400000”, “10”))’

This will run the file through Scala’s interpreter and execute a benchmark 10 times by sorting a random array of 400,000 elements with randomly chosen values from 1 to 1 million.


27
Dec 09

Insertion Sort in Scala

Here is an implementation of insertion sort using Scala. Learning the ins/outs of Scala idioms and collections.

<

div>

import scala.util._
import scala.collection.mutable._</p>

<p>object InsertSort {</p>

<p>def main(args: Array[String]) : Unit = {
    val list:ListBuffer[Int] = initListWith(args.first.toInt)
    println(&quot;Before: &quot;+list)
    println(&quot;After: &quot;+insertSort(list))
  }</p>

<p>private def insertSort(list:ListBuffer[Int]):ListBuffer[Int] = {
    (1 until list.length).foreach( (i:Int) =&gt; {
      rewindCompare(list, i)
    })
    list
  }</p>

<p>private def rewindCompare(list:ListBuffer[Int], idx:Int):Unit = {
    val value = list(idx)
    def swap(i: Int, j: Int) { val v = list(i); list(i) = list(j); list(j) = v }
    for (revIdx &lt;- idx until 0 by -1) {
      if (value &lt; list(revIdx-1)) swap(revIdx, revIdx-1)
      else return
    }
  }</p>

<p>private def initListWith(limit:Int):ListBuffer[Int] = {
    val list:ListBuffer[Int] = new ListBuffer()
    val randGen = new Random()
    (0 until limit).foreach( i =&gt; list += randGen.nextInt(1000000) )
    return list
  }
}

Compile and run as…

$ scalac InsertionSort.java
$ scala InsertionSort 20

Outputs:

Before: ListBuffer(548915, 963275, 872581, 284656, 52299, 261918, 222533, 288234, 340859, 546783, 428115, 659633, 450991, 693520, 15495, 108540, 402193, 48479, 701302, 269521)

After: ListBuffer(15495, 48479, 52299, 108540, 222533, 261918, 269521, 284656, 288234, 340859, 402193, 428115, 450991, 546783, 548915, 659633, 693520, 701302, 872581, 963275)

Your output will vary, as the initial list which is sorted is generated randomly.


26
Nov 09

Apache Ivy resolving local maven artifacts

Just submitted a fix to GRAILS-5327. This bug was really bothering me, mostly because ivy’s API documentation is non-existent. Even the user docs are begging to be enriched with more clarity and content.

Either way, for those of you facing the same issue. The problem is that sometimes you use maven to publish your modules to your local repository usually in ~/.m2/repository directory. Ivy has a FileSystemResolver which allows one to resolve to the file system using the maven2 patterns. It can be set up very easily, but the problem I ran into is that although ivy resolves the local dependencies, it does nothing about its transitive dependencies. WTF? Why would I even want to use ivy then. So I went looking, online through posting, bad documentation, and javadocs to no avail. There was one posting which specified that one should use the pom ending vs. [ext]. So my previous artifact pattern…

${repoPath}/[organisation]/[module]/[revision]/[module]-[revision](-[classifier]).[ext]

should be changed to

${repoPath}/[organisation]/[module]/[revision]/[module]-[revision](-[classifier]).pom

I did this, but still not go. After trying a few combinations, I finally figured it out. Notice, the above artifact pattern is bolded. This is because ivy also has an ivy pattern. Because I’m not really interested in diving into ivy’s source (I seldom use it) and docs suck, I just took a dumb down brute force approach, let’s try a few combination and see how ivy behaves. Well, one of them worked. Basically, you must set both ivy and artifact patterns…

localFileSystemMavenResolver.addIvyPattern("${repoPath}/[organisation]/[module]/[revision]/[module]-[revision](-[classifier]).pom")
localFileSystemMavenResolver.addArtifactPattern("${repoPath}/[organisation]/[module]/[revision]/[module]-[revision](-[classifier]).[ext]")

Why? I have no freaking idea. I’m not even going to guess, since even the posts by the core ivy team on the post linked above, seem to be stabbing in the dark. “try this…” they say. Well, it works. Maybe some day, I’ll have the interest and the time to look at ivy source, for now, I’m happy with using maven for most java projects and Grails’ new dependency management (using ivy for dep resolution) for my Grails projects.

Hope this helps anyone battling ivy. I would think a dependency resolution library that claims it can do for ant what maven does with dependency management, would at least properly and transitively resolve maven dependencies without too much poking around. I mean, are there any viable non-maven repositories out there? I really don’t think so.


25
Nov 09

Merge sort implementation and performance in Scala and Ruby

I’m not trying to turn this into language A vs. B debate, its just that something interesting happened last night. I’m trying to learn both Scala and Ruby. I’m a bit more enthused by Scala at this point, mostly because I somewhat prefer static typing and I’m a long time Java guy. Now, I am also a long time Perl guy, so I’m not necessarily choosing Scala over Ruby because of lack of experience with dynamic languages.

So, with this said, the best way to learn language idioms is to start actually implementing something. A sample application is good, but due to frameworks and relatively low business logic/algorithm ration to these sample apps, at least the ones I can conceive, you mostly spend time learning the framework idioms vs. actual languages ones.

So I thought I’d start by implementing some algorithms in these languages. I decided on Merge Sort. It’s a O(n log n) divide and conquer algorithm. See the link for more info.

Let’s start off with the code. Again, this is not done in an OO and/or reusable manner, this is mostly procedural/functional code to test mostly functional idioms.

Scala code:

  import java.math._
  import scala.util._
  import scala.collection.mutable._
  object MergeSort extends Application  {

    println("Starting...");

    val list = initListWith(1000)

    val start = System.currentTimeMillis()
    println("Sorted " + merge_sort(list.toList).length + " elements.")
    val end = System.currentTimeMillis()
    println("Total time: " + (end - start))

    def merge_sort(list:List[Int]): List[Int] = {
      if (list.length <= 1) {return list}
      var (left: List[Int], right: List[Int]) = list splitAt Math.round(list.length/2.0).toInt
      left = merge_sort(left); right = merge_sort(right)
      if(left.last > right.head)
        merge(left, right);
      else
        left ::: right
    }

    def merge(l:List[Int], r:List[Int]):List[Int] = {
      var result:ListBuffer[Int] = new ListBuffer()
      var left = l; var right = r;
      while(left.length > 0 && right.length > 0) {
        if(left.head <= right.head) {
          result += left.head
          left = left.tail
        } else {
          result += right.head
          right = right.tail
        }
      }
      if(left.length > 0)
        result.toList ::: left
      else
        result.toList ::: right
    }

    def initListWith(limit:Int):List[Int] = {
      val list:ListBuffer[Int] = new ListBuffer()
      val randGen = new Random()
      (0 until limit).foreach( i => list += randGen.nextInt(1000000) )
      return list.toList
    }

  }

And Ruby code:

  def merge_sort(*list)
    return list if list.size <= 1

    result = []
    middle = (list.size/2.0).round
    left = list[0, middle]
    right =  list[middle..-1]

    left = merge_sort(*left)
    right = merge_sort(*right)

    if left[-1] > right[0]
      result = merge(left, right);
    else
      result = left + right
    end
    return result
  end

  def merge(left, right)
    result = Array.new()
    while left.size > 0 && right.size > 0
      if left.first <= right.first
        result << left.slice!(0)
      else
        result << right.slice!(0)
      end
    end

    if left.size > 0
      result.concat(left)
    else
      result.concat(right)
    end
    return result
  end

  list = (1..1000).collect {|i| rand(1000000); }

  puts "Starting..."
  start = (1000 * Time.now.to_f).to_i
  puts "Sorted " + merge_sort(*list).size.to_s + " elements."
  end_time = (1000 * Time.now.to_f).to_i
  total_time = end_time - start
  puts "Total time: #{total_time}"

So all works great, they both perform the operation correctly. But here is the weird part. JVM bytecode (Scala is very similar to Java) is supposed to be super fast. That’s the assumption I’ve always made on various readings as well as empirical data. Now, check out these benchmarks…

$ scalac merge_sort.scala && scala MergeSort Starting... Sorted 10000 elements. Total time: 374 $ ruby merge_sort.rb Starting... Sorted 10000 elements. Total time: 138

I’m not timing the compilation/startup phase, as you can see from the code. In the case of 10K records, ruby performs almost 3 times as fast. It the case of 100K records:

$ scalac merge_sort.scala && scala MergeSort Starting... Sorted 100000 elements. Total time: 31213 $ ruby merge_sort.rb Starting... Sorted 100000 elements. Total time: 7681

Ruby is 4 times faster than Scala in this case, of course the growth is relative to the (n log n) performance.

I’m not focusing on Ruby here, since Scala is what I’m concerned about. I used a List for most operations that didn’t require in place modifications. I used a ListBuffer for mutable lists, which advertises constant time append/prepend operations. I’m not sure what I’m missing. I’m going to play with it a bit more to see where the culprit is, but again, I’m not a Scala expert, though not very familiar with its collections library and the ins/outs of each implementation.

Any ideas?

** Update **

So after Maxim’s comment, I updated Scala to use

while(left.lengthCompare(0) > 0 && right.lengthCompare(0) > 0)

vs.

while(left.length > 0 && right.length > 0)

All I can say, Scala #$@%^ing rocks. So it seems like the length calculation on a mutable collection is the culprit, which makes perfect sense.

Using lengthCompare(l:Int):Int gets rid of the brute force length calculation. Here is what I get from the docs…

Result of comparing length with operand l. This method is used by matching streams against right-ignoring (…,_*) patterns.

I’m going to dig into the source to figure out how it does this comparison in a bit, but I can only guess that if say, we’re comparing it to length 0, all it needs is to get to the first element in the collection to figure out that the result is false, no need to actually get the full length.

Did I say culprit, well, let me say, BIG culrpit. Here are the benchmark results now…

Scala:

$ scalac merge_sort.scala && scala MergeSort Starting... Sorted 100000 elements. Total time: 387

Ruby:

$ ruby merge_sort.rb Starting... Sorted 100000 elements. Total time: 7667

Wow, what a drastic improvement.

*** Disclaimer: This means nothing in terms of benchmarking! Please don’t waste your time disqualifying this benchmark, because I know it really means nothing in this small context as well as it means nothing without _power of a test measurement. Benchmarks are crap unless properly and fairly compared. This post is not about benchmarking languages, rather I wanted to see how the various idioms and libraries perform while learning both languages. It turned out to be very useful, as a small recommended tuning in Scala’s collections lib completely turned things inside out_ ***

Any ruby folks out there care to comment?


26
Oct 09

Java multiple class loaders issue

I was facing a rather subtle bug building a decision engine using Drools. Not to get into details of drools here, but the following mostly applies to many rules engines. The basic architecture of the decisions engine is that rules are written using a drools-based rule DSL. These rules are then applied to “facts”, which in the context of java, are just POJOs. These POJOs can be defined using standard java classes, or using Drools declare facilities which allow you to declare fact types in the actual rules file and then query it using JavaBean like helper utilities in the java application. In our decisions engine, we allow to define types either using drools declare and/or using groovy to define the fact object, which is then parsed, defined, and linked by GroovyClassLoader at runtime, making it available to the working memory of the drools engine as well as its rule parser.

Everything seemed like it was working using Drools defined facts using declare. I was able to dynamically create and query the fact types using reflection. This was great, as we can inject different rules and deploy functionality on the fly. I then created a groovy class, parsed it and injected into the drools package builder as well as its working memory.

Code I used to test the rule inference on the fact type.

def domainClass = '''
  package com.buycentives.types
  import com.buycentives.incengine.*
  class Input {
    String name
    int age
    boolean valid
  }
'''
Class inputClass = new GroovyClassLoader().parseClass(domainClass)
def input = inputClass.newInstance()
input.name = "Ilya"
input.age = 25

/// Here we inject the fact and fire the rules, see the following java code snippet

// This code was originally failing
assertEquals input.valid, true

Here is the code I used to assemble the working memory. This code is more modularized in my code and I tried to compile it into a snippet, though I might of missed something and some object instantiations are not shown.

ClassLoader loader = new GroovyClassLoader()
PackageBuilderConfiguration config = new PackageBuilderConfiguration();
config.setClassLoader(loader);

KnowledgeBuilder kBldr = KnowledgeBuilderFactory.newKnowledgeBuilder(config);
kBldr.add(ruleResource, ResourceType.DRL);

if (kBldr.hasErrors())
  throw new RuntimeException("Unable to compile rules. " + kBldr.getErrors().toString());

KnowledgeBase kb = KnowledgeBaseFactory.newKnowledgeBase(new RuleBaseConfiguration(loader));
kb.addKnowledgePackages(kBldr.getKnowledgePackages());
StatelessKnowledgeSession kSession = kBase.newStatelessKnowledgeSession();
kSession.execute(factObject);

The rule that was being asserted

package com.buycentives.incengine.rules

import java.util.Date
import com.buycentives.types.Input

rule "Assert for age"
  when
    $input: Input( age >= 20, age < 30 )
  then
    $input.setValid(true);
end

To make the story short, the rule never fired, although I verified that the Input object instance was injected into the working memory, though never changing the fact’s valid property. I beat my head against the wall over and over and over and finally I realized what was going on. It’s java’s class loader caveat. It’s not really a bug, rather a feature I guess. It’s not really documented in the javadocs, but I’ve seen references to this caveat before.

When you load the same class using two different class loaders, inside the jvm, they are considered two different classes, although logically they are the same, therefore failing the Class.isAssignableFrom and instanceof conditional test.

So the following test would fail…

def domain = "class Test {}"

GroovyClassLoader loader1 = new GroovyClassLoader()
Class clazz1 = loader1.parseClass(domain)

GroovyClassLoader loader2 = new GroovyClassLoader()
Class clazz2 = loader2.parseClass(domain)

assertTrue clazz1.isAssignableFrom(clazz2)

So the lesson here, beware of injecting class loaders, especially when you’re comparing classes that were loaded with a different class loader than the class loader of the thread that’s inferring equality.

Also, beware of GroovyClassLoader.parseClass method. It parses, defines, and links the class, but if you then use it again on the same class, the resulting classes are different.

The following test fails…

GroovyClassLoader loader = new GroovyClassLoader()
Class clazz1 = loader.parseClass(domain)
Class clazz2 = loader.parseClass(domain)

assertTrue clazz1.isAssignableFrom(clazz2)

Also, this link from “Java Specialist Newsletter” is a good source of some interesting class loader information.

Page 7 of 13« First...56789...Last »