Posts Tagged: scala


10
Feb 10

Avoid using nulls in Scala

Scala’s handling of null’s mixed with implicit casting is quite tricky. I learned the hard way today and it took hours to figure out what was going on. First I thought it was a bug, but then someone pointed out how implicit casting effect null method parameters.

The bottom line is: DO NOT USE NULLs unless you are utilizing java libraries and have no choice. Use Option instead, with Some() or None().

The problem is best described with code…

  def checkNullOrEmpty(v:Seq[Any]):Boolean = {
    println("Class:"+v.getClass)
    return (v != null) && !v.isEmpty
  }

  case class Race(val event:String, val protocol:String) {
    println("Event:"+event+", protocol:"+protocol)
    assert(checkNullOrEmpty(event))
    assert(checkNullOrEmpty(protocol))
  }

  val t = new Race(null, null)

Pasting the above into REPL yields the following result…

Event:null, protocol:null
Class:class scala.collection.immutable.WrappedString
java.lang.NullPointerException
    at scala.Proxy$class.toString(Proxy.scala:29)
    at scala.collection.immutable.WrappedString.toString(WrappedString.scala:22)
    at scala.collection.immutable.StringLike$class.length(StringLike.scala:48)
    at scala.collection.immutable.WrappedString.length(WrappedString.scala:22)
    at scala.collection.IndexedSeqLike$class.isEmpty(IndexedSeqLike.scala:81)
    at scala.collection.immutable.WrappedString.isEmpty(WrappedString.scala:22)
    at .checkNullOrEmpty(<console>:6)
    .....

So why is NPE thrown at this breakpoint return (v != null) && !v.isEmpty?

So let’s look further into the output. When a Race instance is created, the constructor values are initialized to null. Inside the constructor we print this out and verify that values are in fact null. When We get to checkNullOrEmpty method call, the class of v is WrappedString and though the object is no longer null. In Java the call to getClass would fail, as v would be null, in Scala it’s casted (converted) to WrappedString.

This happens through Scala’s implicit conversions. The checkNullOrempty method expects a Seq. Although Seq is not a superclass or interface of String. So when we create the Race instance and specify both event and protocol as null, they are still String types with a null reference. Using say event or protocol as an argument to checkNullOrEmpty yields an implicit conversion. Why? Well, in Java the compilation would fail, since the Seq trait is not a part of the inheritance hierarchy of String, but in Scala, it succeeds, as Scala finds an implicit conversion method to convert String to WrappedString. This method is defined in Predef object. We know the Predef is imported by default into all Scala classes. Predef extends LowPriorityImplicits class, which defines this implicit conversion implicit def wrapString(s: String): WrappedString = new WrappedString(s). So basically Scala decides that the best way to convert the String type into Seq[Any] is by using this implicit conversion. So it wraps the null value with the WrappedString.

This causes two issues… First, the not null check no longer works, as the object is not null, due to the fact that it’s an instance of WrappedString, so (v != null) is true. Since that passes, it then executes the RHS of && operator and then tries to infer on !v.isEmpty, which throws the NPE, as the underlying String value wrapped is null.

I’m not necessarily sure whether this is a bug, feature, or maybe there is no real consensus on how null should be handled, but as you see this causes issues and should either be avoided through avoiding null and using Option instead. If you are using java libs that return nulls, wrapping the return value in Option might be a good idea, before proceeding any further.


6
Feb 10

Implementing bloom filter with a murmur hash function

Recently I read a blog post by Jonathan Ellis about bloom filters. Jonathan works on Cassandra and though had lots of empirical recommendations on its implementation. Cassandra uses bloom filter extensively for performance optimization. Bloom filter is a space-efficient probabilistic data structure used to test whether an element belongs to a set. It’s similar to a standard hashtable, except that its space efficient and doesn’t store the actual value, rather it hashes the keys in a bit array. The reason it’s a probabilistic data structure, is that it allows false positives, but not false negatives. This means, that to answer the question whether A is a subset of B (A ⊆ B), a bloom filter returns true or false. A false (doesn’t exist) is guaranteed to be accurate, but true (exists), has a probability of being false positive.

So why would one use such an algorithm? Say you store records on disk. Someone requests a particular record and you proceed to seek this record. This is usually an expensive operation for high throughput or systems with limited resources, though before invoking such an expensive operation you can find out if the record exists. If record does exist, you can then proceed to retrieve it through the more intensive operation. Because there is a small probability of this being a false positive, you might still find (through the resource intensive operation) that it doesn’t exist. So in an environment where you’re servicing many requests which might not exist, you can reduce the amount of expensive operations and answer such request in constant time O(1).

Jonathan’s blog post provides some great information about how to implement a very effective bloom filter. One of the most important considerations is the hash function. To lower the probability of false positives, one must use a hash function which effectively distributes the hashes across the hash-space. Jonathan recommends the use of murmur hash algorithm, which is one of the most efficient and effective hash functions, which has great performance and low collisions rate.

Another thing done to reduce hash collisions and in turn false positives, is the fact that you don’t just turn on the bits of a single hash function result, rather, you do so numerous times. (5 times is referred to a lot in literature and seems like a sweet spot). This means, that you take a key, calculate 5 hashes (using 5 different hash algorithms or a single hash algorithm strategy I’ll discuss below) and set the bit for each one of these hashes in the bit array. Answering the question of whether the key exists, does the reverse. Calculate 5 hashes and check to make sure they are all set in the bit array. If any of the 5 aren’t set, then you can be assured it doesn’t exist.

So let’s look at some code. Below is the implementation of bloom filter in Scala. It relies on a murmur hash implementation which I won’t list, but you can view/download it here.

  import scala.collection.mutable.BitSet

  class BloomFilter(capacity:Int, hashable:Hashable, hashCount:Int = 5) {

    private val buckets:BitSet = { new BitSet(capacity) }
    private val hashFunc = hashable.hashes(hashCount)(capacity) _

    def addValue(value:String) {
      hashFunc(value).foreach( buckets += _ )
    }

    def exists_?(value:String):Boolean = {
      for ( i <- hashFunc(value) ) if (!buckets.contains(i)) return false
      return true
    }
  }

  trait Hashable {
    def hashes(hashCount:Int)(max:Int)(value:String):Array[Int]
  }

  class MurmurHashable extends Hashable {
    import com.cobrio.algorithms.{MurmurHash => MH}
    def hashes(hashCount:Int)(max:Int)(value:String):Array[Int] = {
      val hash1 = MH.hash(value.getBytes, 0)
      val hash2 = MH.hash(value.getBytes, hash1)
      ( for ( i <- 0 until hashCount) yield Math.abs((hash1 + i * hash2) % max) ).toArray
    }
  }

The code above should be pretty self explanatory, but let’s just take a look at the hashing strategy. We calculate 5 hashes (default) on the key being stored, although we only ever invoke the murmur algorithm twice. Look at the highlighted lines above. Adam Kirsch and Michael Mitzenmacher wrote a paper titled, Less Hashing, Same Performance…, which shows that using a particular hashing technique which simulates additional hash functions beyond two, can increase performance of bloom filters without any significant loss in the false positive probability. To summarize the math in the paper, this is the formula: gi(x) = h1(x) + ih2(x) mod m, where m is the number of buckets in the bloom filter, h1 and h2 are the two calculated hashes respectively, and i will range from 0 up to k – 1 where k is the number of hashes we want to generate.

Here is how you’d use the above bloom filter…

  val bloom = new BloomFilter(2000, new MurmurHashable())
  bloom.addValue("Ilya Sterin")
  bloom.addValue("Elijah Sterin")

  assert(bloom.exists_?("Ilya Sterin"))
  assert(bloom.exists_?("Elijah Sterin"))
  assert(!bloom.exists_?("Don't Exist"))

3
Feb 10

The start of the Scala journey (concurrency and idiomatic Scala rant)

I’ve been following Scala off and on for about 2 years now. Mostly in spurts, I liked the language, but due to the workload and other priorities I never had the time to take it for a full ride. Well, over the last 2 weeks, I decided to take the full plunge. Full meaning, I’m taking a highly concurrent production application which power’s a very critical component of our application, and rewriting it in Scala. I’m doing this for more than just fun. This application has grown from a very cleanly architected one, to one that is still rather nicely designed, but has accumulated a lot of technical debt. With everything I’ve learned about Scala, I think I can redesign it to be cleaner, more concise, and probably more scalable. The other big driving reason I’m looking to give Scala a shot, is due to its Actor based concurrency. I’ve worked with Java’s threading primitives for many years and have accumulated a love/hate relationship. The JSE 5 concurrency package brought some nice gems to my world, but it didn’t eliminate the fact that you’re still programming to the imperative model of shared state synchronization. Scala actors hide some of the ugliness of thread synchronization, though don’t eliminate the issue completely. Due to the nature of Scala, being a mix between imperative and functional language and the fact that actors are implemented as a library, nothing stops one from running into same issues as in more primitive thread state-sharing operations (i.e. race conditions, lock contentions, deadlocks/livelocks). Basically, if you’re using actors as just an abstraction layer over old practices, you’ll be in the same boat as you started with Java. With all of that said, unlike Java, Scala provides you the facilities for designing cleaner and more thread safe systems due to its functional programming facilities. Mutable shared state is the leading cause of non-determinism in Java concurrent applications, so immutability and message passing is a way into a more deterministic world.

I’ve also looked at other concurrent programming models, like STM/MVCC. STM is the basis of concurrent programming in Clojure and it’s a different paradigm than Actors. STM is a simpler model if you’re used to programming the old imperative threading, as they abstract you from concurrency primitives by forcing state modifications to occur in a transactional context. When this occurs, the STM system takes care of ensuring the state modifications occur atomically and in isolation. In my opinion this system suites the multi-core paradigm very well and allows smoother transition, the problem with it, at least in the context of MVCC, is that for each transaction and data structure being modified, a copy is made for the purposes of isolation (implementation of copying is system dependent, some might be more efficient than others), but you can already see an issue. For a system that has to handle numerous concurrent transactions involving many objects, this can become a bottleneck and the creation of copies can overburden system’s memory and performance. There are some debates about that in the STM world, mostly involving finding the sweet spot for such systems, where the cost of MVCC is less relevant the the cost of constant synchronization through locking.

Actors model is different, it works in terms of isolated objects (actors), all working in isolation by message passing. None can modify or query the state of another, short of requesting such an operation by sending a message to that particular object (actor). In Scala, you can break that model, as you can send around mutable objects, but if you are to really benefit from the Actor model, one should probably avoid doing that. Actors lend themselves better to concurrent applications, that not only span multiple-cores, but also can easily be scaled to multiple physical nodes. Because messages being passed are immutable data structures that can be easily synchronized and shared, the underlying Actor system can share these message across physically dispersed actors just as it can for the actors within the same physical memory space.

So the world of concurrency is getting more exciting with these awesome paradigms. One thing to remember is that there is no one size fits all concurrency model and I don’t see any one of the above becoming the de-facto standard any time soon. There is a sweet spot for each, so one should learn the ins and outs of each model.

Now that I got the concurrency out of the way, let’s get back to the actual syntax of Scala. Scala is very powerful (at least compared to Java). This power comes with responsibility. You can use Scala to write beautiful/concise programs, or you can use it to write obscure/illegible programs that no one, including the original author, will be able to comprehend. Personally, I prefer and can responsible handle this responsibility. I’m a long time Perl programmer (way before I started programming Java), and I’ve seen (and even written at times), programs that Larry Wall himself wouldn’t be able to comprehend.

Scala comes with operator overloading, but when not judiciously used, that power alone can be responsible for ineligibility of any system. This is one of the major reasons why languages like Java decided to not include it. Personally, I think operator overloading can be a beautiful addition to any API. It can make writing DSLs easier and using them more natural. Again, this power is great in the use of experienced and responsible programmers.

After having experience great power (Perl) and great restraint (Java), I’m leaning more towards power (who wouldn’t :-). One one hand, it’s nice to be able to read and comprehend anyone’s Java program, even when it’s not nicely written, on the other hand, it’s a pain trying to write a program and jumping through all the hoops and limitations because of the various constraints. In a perfect AI world, the compiler would infer the capabilities of the programmer and restrict its facilities based on those, in some way as to not offend anyone:-) So if a novice is inferred, ah, there goes the operator overloading and implicit conversions, etc… But for now, I’d rather have a powerful tool to use when I write software and Scala seems to push the right buttons for me at this point.

I’m going to start of a list of posts, starting with this one, about my experiences with Scala.

Here is a little something I came up with a few hours ago. Our software has some limited interoperability with a SQL database and requires a light abstraction. We chose not to use any 3rd party ORM or SQL abstraction, mostly due to the fact that the dependency on these abstractions don’t really provide any benefit for our limited use of SQL. So I developed a simple SQL variant abstraction layer, which allows us to execute SQL queries which are defined in the SQLVariant implementation. Moving from one database to another, just requires one to implement a SQLVariant interface to provide the proper abstraction. I initially wrote this in java and although it was decent, it required quite a bit more code and didn’t look as concise as I wanted it. One issue was PreparedStatement and it’s interface for placeholder bindings. How would one bind java’s primitive and wrapper types as placeholders and how would the SQLVariant know which PreparedStatement.bind* method to call? I resorted to using an enumeration which defines these operations and reflection for the purpose of invoking these operations. I’m basically sidestepping static typing in a place I’m not sure I really want or have to. Here is the java implementation.

I got rid of a few methods, specifically dealing with resultset, statement, and connection cleanup, as they don’t really emphasize my point here.

  import java.lang.reflect.Method;
  import java.sql.*;
  import java.util.ArrayList;
  import java.util.Collections;
  import java.util.List;

  public abstract class SqlVariant {

    public abstract SqlSelectStatement getResultsNotYetNotifiedForStatement(NotificationType... types);

    public abstract SqlSelectStatement getResultsNotYetNotifiedForStatement(int limit, NotificationType... types);

    public abstract SqlUpdateStatement getUpdateWithNotificationsForStatement(Result result);

    private abstract static class SqlStatement<T> {

      protected String sql;
      protected List<BindParameter> bindParams = new ArrayList<BindParameter>();
      protected PreparedStatement stmt;

      public SqlStatement(String sql) {
        this.sql = sql;
      }

      public SqlStatement addBindParam(BindParameter param) {
        bindParams.add(param);
        return this;
      }

      public String getSql() {
        return sql;
      }

      public List<BindParameter> getBindParams() {
        return Collections.unmodifiableList(bindParams);
      }

      protected PreparedStatement prepareStatement(Connection conn) throws SQLException {
        PreparedStatement stmt = conn.prepareStatement(sql);
        for (int bindIdx = 0; bindIdx < bindParams.size(); bindIdx++) {
          BindParameter p = bindParams.get(bindIdx);
          try {
            Method m = stmt.getClass().getMethod(p.type.method, Integer.TYPE, p.type.clazz);
            m.invoke(stmt, bindIdx + 1, p.value);
          }
          catch (Exception e) {
            throw new RuntimeException("Couldn't execute method: " + p.type.method + " on " + stmt.getClass(), e);
          }
        }
        return stmt;
      }

      public abstract T execute(Connection conn) throws SQLException;
    }

    public static final class SqlSelectStatement extends SqlStatement<ResultSet> {

      public SqlSelectStatement(String sql) {
        super(sql);
      }

      @Override
      public ResultSet execute(Connection conn) throws SQLException {
        return prepareStatement(conn).executeQuery();
      }
    }

    public static final class SqlUpdateStatement extends SqlStatement<Boolean> {
      public SqlUpdateStatement(String sql) {
        super(sql);
      }

      @Override
      public Boolean execute(Connection conn) throws SQLException {
        stmt = prepareStatement(conn);
        return stmt.execute();
      }
    }


    public static final class BindParameter<T> {
      private final BindParameterType type;
      private final T value;

      public BindParameter(Class<T> type, T value) {
        this.type = BindParameterType.getTypeFor(type);
        this.value = value;
      }

      public BindParameter(BindParameterType type, T value) {
        this.type = type;
        this.value = value;
      }
    }

    private static enum BindParameterType {
      STRING(String.class, "setString"),
      INT(Integer.TYPE, "setInt"),
      LONG(Long.TYPE, "setLong");

      private Class clazz;
      private String method;

      private BindParameterType(Class clazz, String method) {
        this.clazz = clazz;
        this.method = method;
      }

      private static BindParameterType getTypeFor(Class clazz) {
        for (BindParameterType t : BindParameterType.values()) {
          if (t.clazz.equals(clazz)) {
            return t;
          }
        }
        throw new IllegalArgumentException("Type: " + clazz.getClass() + " is not defined as a BindParameterType enum.");
      }
    }
  }

Now, here is how one would implement the SQLVariant interface. The below implementation is in groovy. I choose groovy when I have to do lots of string interpolation, which somehow java and scala refuse to support. The code was shortened to just demonstrate the bare minimum.

  class MySqlVariant extends SqlVariant {

    @Override
    public SqlVariant.SqlSelectStatement getResultsNotYetNotifiedForStatement(int limit, NotificationType[] types) {
      SqlVariant.SqlSelectStatement stmt = new SqlVariant.SqlSelectStatement("SELECT ...")
      for (NotificationType t : types)
        stmt.addBindParam(new SqlVariant.BindParameter(String.class, t.name().toUpperCase()))
      return stmt;
    }

    @Override
    public SqlVariant.SqlUpdateStatement getUpdateWithNotificationsForStatement(Result result) {
      SqlVariant.SqlUpdateStatement stmt = new SqlVariant.SqlUpdateStatement("INSERT INTO ....")
      result.notifications?.each { Notification n ->
        stmt.addBindParam(new SqlVariant.BindParameter(SqlVariant.BindParameterType.LONG, n.id))
        stmt.addBindParam(new SqlVariant.BindParameter(SqlVariant.BindParameterType.LONG, result.intervalId))
      }
      return stmt
    }

    ......
  }

I started reimplementing the above in Scala and I ran across a very powerful and beautiful Scala implicit conversion feature. This allowed me to truly abstract the SQLVariant implementations from any bindings specific knowledge, through an implicit casting facility that normally only dynamically typed languages provide. Scala gives us this ability, but also ensures static type safety of implicit conversions during compilation.

Another wonderful feature, is lazy vals, which allows us to cleanly implement lazy evaluation that we (java programmers) are so used to doing by instantiating a member field as null and then checking it before initializing on the initial accessor call. If you’ve seen code similar to below a lot, you’ll rejoice to find out that you no longer have to do this in Scala.

public class SomeClass {
  private SomeType type;

  public SomeType getSomeType() {
    if (type == null) type = new SomeType(); // Often more complex than that
    return type;
  }
}

The above, besides not being ideal, is also error prone if say a type is used anywhere else in SomeClass and you don’t use the accessor method to retrieve it. You must ensure the use of accessor through convention or deal with the fact that it could be non-instantiated. This is no longer the case in Scala as its runtime handles lazy instantiation for you. See below code.

Note: I still allow the client data access abstractions to work with a raw jdbc ResultSet returned from the SQLVariant. I don’t see this as an issue at this point, first since these abstractions are SQL specific and also because ResultSet is a standard interface for any JDBC SQL interaction. Here is my concise Scala implementation. I’m still learning, so this might change as I get more familiar with Scala idioms and start writing more idiomatic Scala code.

  import javax.sql.DataSource
  import java.sql.{ResultSet, Connection, PreparedStatement}
  import com.bazusports.chipreader.sql.SqlVariant.{SqlSelectStatement, BindingValue}

  abstract class SqlVariant(private val ds: DataSource) {

    def retrieveConfigurationStatementFor(eventTag: String): SqlSelectStatement;

    protected final def connection: Connection = ds.getConnection
  }

  object SqlVariant {

    trait BindingValue {def >>(stmt: PreparedStatement, idx: Int): Unit}

    // This is how implicit bindings happen.  This is beauty, we can now
    // bind standard types and have the compiler perform implicit conversions
    implicit final def bindingIntWrapper(v: Int) = new BindingValue {
      def >>(stmt: PreparedStatement, idx: Int) = {stmt.setInt(idx, v)}
    }

    implicit final def bindingLongWrapper(v: Long) = new BindingValue {
      def >>(stmt: PreparedStatement, idx: Int) {stmt.setLong(idx, v)}
    }

    implicit final def bindingStringWrapper(v: String) = new BindingValue {
      def >>(stmt: PreparedStatement, idx: Int) {stmt.setString(idx, v)}
    }

    abstract class SqlStatement[T](conn: Connection, sql: String, params: BindingValue*) {

      // Ah, another beautiful feature, lazy vals.  Basically, it's
      // evaluated on initial call.  This is great for the
      // so common lazy memoization technique, of checking for null.
      protected lazy val statement: PreparedStatement = {
        val stmt:PreparedStatement = conn.prepareStatement(sql)
        params.foreach((v) => v >> (stmt, 1))
        stmt
      }

      def execute(): T
    }

    class SqlUpdateStatement(conn: Connection, sql: String, params: BindingValue*)
            extends SqlStatement[Boolean](conn, sql, params: _*) {
      def execute() = statement.execute()
    }

    class SqlSelectStatement(conn: Connection, sql: String, params: BindingValue*)
            extends SqlStatement[ResultSet](conn, sql, params: _*) {
      def execute() = statement.executeQuery()
    }
  }

  /* Implementation of the SQLVariant */

  class MySqlVariant(private val dataSource:DataSource) extends SqlVariant(dataSource) {

    def retrieveConfigurationStatementFor(eventTag: String) =
      new SqlSelectStatement(connection,  "SELECT reader_config FROM event WHERE tag = ?", eventTag)

  }

And the obligatory unit test using the o’ so awesome Scala Specs framework.

  object MySqlVariantSpec extends Specification {
    val ds = getDataSource();

    "Requesting a configuration statement for a specific event" should {
      "return a SqlSelectStatement with properly bound parameters" in {
        val sqlVariant:SqlVariant = new MySqlVariant(ds)
        val stmt:SqlSelectStatement = sqlVariant.retrieveConfigurationStatementFor("abc")
        stmt must notBeNull
        // .... Other assertions go here
      }
    }
  }

Although I barely scraped the tip of the iceberg, I hope this helps you see some of what Scala has to offer. More to come as I progress.


4
Jan 10

Heap Sort in Scala

Heap sort is one of the more efficient sorting algorithms, being that it sorts in constant space (not including the memory space of course to store the n elements). Elements are sorted in place, though the constant space O(1). Total space is O(n) of course if you consider the storage of the elements in memory or on disk.

There is lots of literature on heap sorts out there, so I’m not going to go into it in more details. The reason it’s sometimes preferred over say insertion sort or quick sort, is that the worst case (which one always wants to account for, is O(n log n), which is far better than quadratic O(n2) worst case performance of both insertion and quick sort algorithms.

Here is an implementation of quick sort in Scala. Please note, I wrote this with readability in mind, not conciseness. Sometimes they go hand in hand, but at times, as I o’ so learned with Perl over the years, that’s not true in all cases. A good balance between conciseness and readability should be reached and that comes only with experience 🙂

There are only 3 methods that matter here, rest are just auxiliaries (I highlighted the important methods). The code here can be run independently and serves as a benchmark, as you can run it by specifying the size of the initial array (generated randomly) and the # of benchmarks to run. The benchmark is run, best and worst are disposed and the mean of the rest is displayed.


import scala.util._
import scala.collection.mutable._
import scala.util.Random

object HeapSort {

  private val randGen = new Random()

  def main(args: Array[String]) {
    val arraySize = args.first.toInt
    val repeat = args(1).toInt
    var times = perform(repeat, arraySize)
    println("Average: "+averageOf(times)+" ms for "+times.size+" runs.")
  }

  private def heapSort(arr:Array[Int]):Array[Int] = {
    buildHeap(arr)
    (arr.length-1 until 0 by -1).foreach( i => {
      swap(arr, 0, i)
      heapify(arr, 0, i)
    })
    arr
  }

  def buildHeap(arr:Array[Int]) {
    ((arr.length/2.0D).floor.toInt-1 until -1 by -1).foreach( i => heapify(arr, i, arr.length) )
  }

  def heapify(arr:Array[Int], idx:Int, max:Int) {
    val l = left(idx)
    val r = right(idx)
    var largest = if (l < max && arr(l) > arr(idx)) l else idx
    largest = if (r < max && arr(r) > arr(largest)) r else largest
    if (largest != idx) {
      swap(arr, idx, largest)
      heapify(arr, largest, max)
    }
  }

  private def parent(idx:Int):Int = (idx/2.0D).floor.toInt
  private def left(idx:Int):Int = 2*idx+1
  private def right(idx:Int):Int = (2*idx)+2

  def swap(s:Array[Int], i: Int, j: Int):Unit = {
    val v = s(i);
    s(i) = s(j);
    s(j) = v
  }


  private def perform(times:Int, initListSize:Int):ListBuffer[Int] = {
    val benchmarks:ListBuffer[Int] = new ListBuffer[Int]
    (0 until times).foreach(idx => {
      val arr:Array[Int] = initArrayWith(initListSize)
      val start = System.currentTimeMillis()
      heapSort(arr)
      val end = System.currentTimeMillis()
      benchmarks += (end-start).toInt
    });
    benchmarks
  }

  def initArrayWith(limit:Int):Array[Int] = {
    val list:ListBuffer[Int] = new ListBuffer()
    val randGen = new Random()
    (0 until limit).foreach( i => list += randGen.nextInt(1000000) )
    return list.toArray
  }

  def averageOf(benchmarks:ListBuffer[Int]):Long = {
    var sortedBm:Array[Int] = benchmarks.toArray
    heapSort(sortedBm)
    var sum:Int = 0;
    val sumFunc = (t:Int) => sum += t
    // Get rid of best and worst if we ran it more than twice
    if (sortedBm.length > 2)
      sortedBm.slice(1,sortedBm.size-2).foreach(sumFunc)
    else
      sortedBm.foreach(sumFunc)
    return sum/sortedBm.length
  }
}

You can run this code by saving it to HeapSort.scala file and from command line…

scala -i HeapSort.scala -e ‘HeapSort.main(Array(“400000”, “10”))’

This will run the file through Scala’s interpreter and execute a benchmark 10 times by sorting a random array of 400,000 elements with randomly chosen values from 1 to 1 million.


27
Dec 09

Insertion Sort in Scala

Here is an implementation of insertion sort using Scala. Learning the ins/outs of Scala idioms and collections.

<

div>

import scala.util._
import scala.collection.mutable._</p>

<p>object InsertSort {</p>

<p>def main(args: Array[String]) : Unit = {
    val list:ListBuffer[Int] = initListWith(args.first.toInt)
    println(&quot;Before: &quot;+list)
    println(&quot;After: &quot;+insertSort(list))
  }</p>

<p>private def insertSort(list:ListBuffer[Int]):ListBuffer[Int] = {
    (1 until list.length).foreach( (i:Int) =&gt; {
      rewindCompare(list, i)
    })
    list
  }</p>

<p>private def rewindCompare(list:ListBuffer[Int], idx:Int):Unit = {
    val value = list(idx)
    def swap(i: Int, j: Int) { val v = list(i); list(i) = list(j); list(j) = v }
    for (revIdx &lt;- idx until 0 by -1) {
      if (value &lt; list(revIdx-1)) swap(revIdx, revIdx-1)
      else return
    }
  }</p>

<p>private def initListWith(limit:Int):ListBuffer[Int] = {
    val list:ListBuffer[Int] = new ListBuffer()
    val randGen = new Random()
    (0 until limit).foreach( i =&gt; list += randGen.nextInt(1000000) )
    return list
  }
}

Compile and run as…

$ scalac InsertionSort.java
$ scala InsertionSort 20

Outputs:

Before: ListBuffer(548915, 963275, 872581, 284656, 52299, 261918, 222533, 288234, 340859, 546783, 428115, 659633, 450991, 693520, 15495, 108540, 402193, 48479, 701302, 269521)

After: ListBuffer(15495, 48479, 52299, 108540, 222533, 261918, 269521, 284656, 288234, 340859, 402193, 428115, 450991, 546783, 548915, 659633, 693520, 701302, 872581, 963275)

Your output will vary, as the initial list which is sorted is generated randomly.