04
Jan 10

Heap Sort in Scala

Heap sort is one of the more efficient sorting algorithms, being that it sorts in constant space (not including the memory space of course to store the n elements). Elements are sorted in place, though the constant space O(1). Total space is O(n) of course if you consider the storage of the elements in memory or on disk.

There is lots of literature on heap sorts out there, so I’m not going to go into it in more details. The reason it’s sometimes preferred over say insertion sort or quick sort, is that the worst case (which one always wants to account for, is O(n log n), which is far better than quadratic O(n2) worst case performance of both insertion and quick sort algorithms.

Here is an implementation of quick sort in Scala. Please note, I wrote this with readability in mind, not conciseness. Sometimes they go hand in hand, but at times, as I o’ so learned with Perl over the years, that’s not true in all cases. A good balance between conciseness and readability should be reached and that comes only with experience 🙂

There are only 3 methods that matter here, rest are just auxiliaries (I highlighted the important methods). The code here can be run independently and serves as a benchmark, as you can run it by specifying the size of the initial array (generated randomly) and the # of benchmarks to run. The benchmark is run, best and worst are disposed and the mean of the rest is displayed.


import scala.util._
import scala.collection.mutable._
import scala.util.Random

object HeapSort {

  private val randGen = new Random()

  def main(args: Array[String]) {
    val arraySize = args.first.toInt
    val repeat = args(1).toInt
    var times = perform(repeat, arraySize)
    println("Average: "+averageOf(times)+" ms for "+times.size+" runs.")
  }

  private def heapSort(arr:Array[Int]):Array[Int] = {
    buildHeap(arr)
    (arr.length-1 until 0 by -1).foreach( i => {
      swap(arr, 0, i)
      heapify(arr, 0, i)
    })
    arr
  }

  def buildHeap(arr:Array[Int]) {
    ((arr.length/2.0D).floor.toInt-1 until -1 by -1).foreach( i => heapify(arr, i, arr.length) )
  }

  def heapify(arr:Array[Int], idx:Int, max:Int) {
    val l = left(idx)
    val r = right(idx)
    var largest = if (l < max && arr(l) > arr(idx)) l else idx
    largest = if (r < max && arr(r) > arr(largest)) r else largest
    if (largest != idx) {
      swap(arr, idx, largest)
      heapify(arr, largest, max)
    }
  }

  private def parent(idx:Int):Int = (idx/2.0D).floor.toInt
  private def left(idx:Int):Int = 2*idx+1
  private def right(idx:Int):Int = (2*idx)+2

  def swap(s:Array[Int], i: Int, j: Int):Unit = {
    val v = s(i);
    s(i) = s(j);
    s(j) = v
  }


  private def perform(times:Int, initListSize:Int):ListBuffer[Int] = {
    val benchmarks:ListBuffer[Int] = new ListBuffer[Int]
    (0 until times).foreach(idx => {
      val arr:Array[Int] = initArrayWith(initListSize)
      val start = System.currentTimeMillis()
      heapSort(arr)
      val end = System.currentTimeMillis()
      benchmarks += (end-start).toInt
    });
    benchmarks
  }

  def initArrayWith(limit:Int):Array[Int] = {
    val list:ListBuffer[Int] = new ListBuffer()
    val randGen = new Random()
    (0 until limit).foreach( i => list += randGen.nextInt(1000000) )
    return list.toArray
  }

  def averageOf(benchmarks:ListBuffer[Int]):Long = {
    var sortedBm:Array[Int] = benchmarks.toArray
    heapSort(sortedBm)
    var sum:Int = 0;
    val sumFunc = (t:Int) => sum += t
    // Get rid of best and worst if we ran it more than twice
    if (sortedBm.length > 2)
      sortedBm.slice(1,sortedBm.size-2).foreach(sumFunc)
    else
      sortedBm.foreach(sumFunc)
    return sum/sortedBm.length
  }
}

You can run this code by saving it to HeapSort.scala file and from command line…

scala -i HeapSort.scala -e ‘HeapSort.main(Array(“400000”, “10”))’

This will run the file through Scala’s interpreter and execute a benchmark 10 times by sorting a random array of 400,000 elements with randomly chosen values from 1 to 1 million.

Tags: ,

One comment

  1. That’s just what i was looking for. I´m new to Scala programming and while trying to understand the code i learned some interesting ways to do things, like that (N1 until N2).foreach( param => { code }) it’s really useful

    Just one thing, in the end you remove the Best and Worst cases from the bechmarks array, but when you return the average you divide the partial sum by the total length. I think it should be divided by length-2.

    return sum/(sortedBm.length-2)

    Plus, the slice function is returning length-3 items instead of length-2 items, so it should be changed to sortedBm.slice(1,sortedBm.size-1).foreach(sumFunc)

Leave a comment