Posts Tagged: algorithms

May 10

Divide and conquer for exponentiation

Here is an awesome way to demonstrate divide and conquer algorithm performing exponentiation. Naive exponentiation algorithms xn would perform n-1 multiplications as n x n … x n-1. This has an algorithmic complexity of O(n) which of course scales poorly for any significantly large number. This is not even including the overhead of performing integer multiplication beyond CPUs capacity is slower than staying within the CPU integer range. Now, do that n times and you have a problem.

Logarithmic performance O(log n) is one of the best common algorithmic complexities there is (outside of constant complexity of course, which is rare). One can achieve calculating power by utilizing the power of logarithms, which are clearly apparent in divide and conquer problem solutions.

Logarithms grow very slow compared to number of inputs, though for a calculating a power of say n1000000, with the naive algorithm, you’d have to perform 999,999 multiplications. With a logarithmic complexity algorithm this drops to log21000000 = ceil(19.93) = 20 steps. 20 steps with a few extra operations for step compared to 1million multiplications.

Here is an example of both exponentiation algorithms, the logarithmic complexity and linear complexity (called naive), as well as built in python pow() function. Both our logarithmic power function and python’s built in one perform the same, where the naive linear function starts to truly deteriorate once any reasonable number is used as the exponent.

_Note: this function is recursive though you can run out of stack space for very large exponents (you can also easily reimplement it as recursion). On a system with a 1024 stack limit, this would mean your exponent would have to be above 21024 or

17976931348623159077293051907890247336179769789423065727343008 11577326758055009631327084773224075360211201138798713933576587 89768814416622492847430639474124377767893424865485276302219601 24609411945308295208500576883815068234246288147391311054082723 7163350510684586298239947245938479716304835356329624224137216

before you run out of stack space._

Here is a benchmarked python implementation. The relevant algorithm part is highlighted.

#!/usr/bin/env python
import math
import time
import sys

def power(b, e):
    """logarithmic divide/conquer algorithm"""
    if e == 0: return 1
    x = power(b, math.floor(e/2))
    if e % 2 == 0: return pow(x, 2)
    else: return b * pow(x, 2)

def naive_power(b, e):
    """linear power algorithm"""
    x = b;
    for i in range(1, e):
        x *= b
    return x

def perform(name, base, exp, pfunc):
    print("%s: %d^%d: %d" % (name, base, exp, pfunc(base, exp)))

if __name__ == '__main__':
    if len(sys.argv) != 3:
        sys.exit("You must provide a base and an exponent.  (Usage: base exp)")
    base = int(sys.argv[1])
    exp = int(sys.argv[2])
    for func in (power, naive_power, pow):
        print("Benchmarking %s..." % func.__name__)
        bench = []
        for i in range(0,5):
            start = time.time()
            ans = func(base, exp)
            end = time.time()
        print("\tCalculated in: %s" % min(bench))

Running above to calculate 2200000

$ python 2 200000
Benchmarking power…
    Calculated in: 0.0042099952697753906
Benchmarking naive_power…
    Calculated in: 6.078423023223877
Benchmarking pow…
    Calculated in: 0.0041148662567138672

Hmmm, both pow() (python’s built in power) and power() (logarithmic complexity) calculated the power in 4 millis (above is in seconds) and our naive_power() function calculates the same result in 6 seconds.

I tried running the script to calculate 21000000, which calculated using logarithmic functions in 25 milliseconds and I killed the naive_power() calculation after a few minutes of impatiently waiting for it to complete.

Power to the logarithms!!! 🙂

Feb 10

Implementing bloom filter with a murmur hash function

Recently I read a blog post by Jonathan Ellis about bloom filters. Jonathan works on Cassandra and though had lots of empirical recommendations on its implementation. Cassandra uses bloom filter extensively for performance optimization. Bloom filter is a space-efficient probabilistic data structure used to test whether an element belongs to a set. It’s similar to a standard hashtable, except that its space efficient and doesn’t store the actual value, rather it hashes the keys in a bit array. The reason it’s a probabilistic data structure, is that it allows false positives, but not false negatives. This means, that to answer the question whether A is a subset of B (A ⊆ B), a bloom filter returns true or false. A false (doesn’t exist) is guaranteed to be accurate, but true (exists), has a probability of being false positive.

So why would one use such an algorithm? Say you store records on disk. Someone requests a particular record and you proceed to seek this record. This is usually an expensive operation for high throughput or systems with limited resources, though before invoking such an expensive operation you can find out if the record exists. If record does exist, you can then proceed to retrieve it through the more intensive operation. Because there is a small probability of this being a false positive, you might still find (through the resource intensive operation) that it doesn’t exist. So in an environment where you’re servicing many requests which might not exist, you can reduce the amount of expensive operations and answer such request in constant time O(1).

Jonathan’s blog post provides some great information about how to implement a very effective bloom filter. One of the most important considerations is the hash function. To lower the probability of false positives, one must use a hash function which effectively distributes the hashes across the hash-space. Jonathan recommends the use of murmur hash algorithm, which is one of the most efficient and effective hash functions, which has great performance and low collisions rate.

Another thing done to reduce hash collisions and in turn false positives, is the fact that you don’t just turn on the bits of a single hash function result, rather, you do so numerous times. (5 times is referred to a lot in literature and seems like a sweet spot). This means, that you take a key, calculate 5 hashes (using 5 different hash algorithms or a single hash algorithm strategy I’ll discuss below) and set the bit for each one of these hashes in the bit array. Answering the question of whether the key exists, does the reverse. Calculate 5 hashes and check to make sure they are all set in the bit array. If any of the 5 aren’t set, then you can be assured it doesn’t exist.

So let’s look at some code. Below is the implementation of bloom filter in Scala. It relies on a murmur hash implementation which I won’t list, but you can view/download it here.

  import scala.collection.mutable.BitSet

  class BloomFilter(capacity:Int, hashable:Hashable, hashCount:Int = 5) {

    private val buckets:BitSet = { new BitSet(capacity) }
    private val hashFunc = hashable.hashes(hashCount)(capacity) _

    def addValue(value:String) {
      hashFunc(value).foreach( buckets += _ )

    def exists_?(value:String):Boolean = {
      for ( i <- hashFunc(value) ) if (!buckets.contains(i)) return false
      return true

  trait Hashable {
    def hashes(hashCount:Int)(max:Int)(value:String):Array[Int]

  class MurmurHashable extends Hashable {
    import com.cobrio.algorithms.{MurmurHash => MH}
    def hashes(hashCount:Int)(max:Int)(value:String):Array[Int] = {
      val hash1 = MH.hash(value.getBytes, 0)
      val hash2 = MH.hash(value.getBytes, hash1)
      ( for ( i <- 0 until hashCount) yield Math.abs((hash1 + i * hash2) % max) ).toArray

The code above should be pretty self explanatory, but let’s just take a look at the hashing strategy. We calculate 5 hashes (default) on the key being stored, although we only ever invoke the murmur algorithm twice. Look at the highlighted lines above. Adam Kirsch and Michael Mitzenmacher wrote a paper titled, Less Hashing, Same Performance…, which shows that using a particular hashing technique which simulates additional hash functions beyond two, can increase performance of bloom filters without any significant loss in the false positive probability. To summarize the math in the paper, this is the formula: gi(x) = h1(x) + ih2(x) mod m, where m is the number of buckets in the bloom filter, h1 and h2 are the two calculated hashes respectively, and i will range from 0 up to k – 1 where k is the number of hashes we want to generate.

Here is how you’d use the above bloom filter…

  val bloom = new BloomFilter(2000, new MurmurHashable())
  bloom.addValue("Ilya Sterin")
  bloom.addValue("Elijah Sterin")

  assert(bloom.exists_?("Ilya Sterin"))
  assert(bloom.exists_?("Elijah Sterin"))
  assert(!bloom.exists_?("Don't Exist"))

Jan 10

Heap Sort in Scala

Heap sort is one of the more efficient sorting algorithms, being that it sorts in constant space (not including the memory space of course to store the n elements). Elements are sorted in place, though the constant space O(1). Total space is O(n) of course if you consider the storage of the elements in memory or on disk.

There is lots of literature on heap sorts out there, so I’m not going to go into it in more details. The reason it’s sometimes preferred over say insertion sort or quick sort, is that the worst case (which one always wants to account for, is O(n log n), which is far better than quadratic O(n2) worst case performance of both insertion and quick sort algorithms.

Here is an implementation of quick sort in Scala. Please note, I wrote this with readability in mind, not conciseness. Sometimes they go hand in hand, but at times, as I o’ so learned with Perl over the years, that’s not true in all cases. A good balance between conciseness and readability should be reached and that comes only with experience 🙂

There are only 3 methods that matter here, rest are just auxiliaries (I highlighted the important methods). The code here can be run independently and serves as a benchmark, as you can run it by specifying the size of the initial array (generated randomly) and the # of benchmarks to run. The benchmark is run, best and worst are disposed and the mean of the rest is displayed.

import scala.util._
import scala.collection.mutable._
import scala.util.Random

object HeapSort {

  private val randGen = new Random()

  def main(args: Array[String]) {
    val arraySize = args.first.toInt
    val repeat = args(1).toInt
    var times = perform(repeat, arraySize)
    println("Average: "+averageOf(times)+" ms for "+times.size+" runs.")

  private def heapSort(arr:Array[Int]):Array[Int] = {
    (arr.length-1 until 0 by -1).foreach( i => {
      swap(arr, 0, i)
      heapify(arr, 0, i)

  def buildHeap(arr:Array[Int]) {
    ((arr.length/2.0D).floor.toInt-1 until -1 by -1).foreach( i => heapify(arr, i, arr.length) )

  def heapify(arr:Array[Int], idx:Int, max:Int) {
    val l = left(idx)
    val r = right(idx)
    var largest = if (l < max && arr(l) > arr(idx)) l else idx
    largest = if (r < max && arr(r) > arr(largest)) r else largest
    if (largest != idx) {
      swap(arr, idx, largest)
      heapify(arr, largest, max)

  private def parent(idx:Int):Int = (idx/2.0D).floor.toInt
  private def left(idx:Int):Int = 2*idx+1
  private def right(idx:Int):Int = (2*idx)+2

  def swap(s:Array[Int], i: Int, j: Int):Unit = {
    val v = s(i);
    s(i) = s(j);
    s(j) = v

  private def perform(times:Int, initListSize:Int):ListBuffer[Int] = {
    val benchmarks:ListBuffer[Int] = new ListBuffer[Int]
    (0 until times).foreach(idx => {
      val arr:Array[Int] = initArrayWith(initListSize)
      val start = System.currentTimeMillis()
      val end = System.currentTimeMillis()
      benchmarks += (end-start).toInt

  def initArrayWith(limit:Int):Array[Int] = {
    val list:ListBuffer[Int] = new ListBuffer()
    val randGen = new Random()
    (0 until limit).foreach( i => list += randGen.nextInt(1000000) )
    return list.toArray

  def averageOf(benchmarks:ListBuffer[Int]):Long = {
    var sortedBm:Array[Int] = benchmarks.toArray
    var sum:Int = 0;
    val sumFunc = (t:Int) => sum += t
    // Get rid of best and worst if we ran it more than twice
    if (sortedBm.length > 2)
    return sum/sortedBm.length

You can run this code by saving it to HeapSort.scala file and from command line…

scala -i HeapSort.scala -e ‘HeapSort.main(Array(“400000”, “10”))’

This will run the file through Scala’s interpreter and execute a benchmark 10 times by sorting a random array of 400,000 elements with randomly chosen values from 1 to 1 million.

Dec 09

Insertion Sort in Scala

Here is an implementation of insertion sort using Scala. Learning the ins/outs of Scala idioms and collections.



import scala.util._
import scala.collection.mutable._</p>

<p>object InsertSort {</p>

<p>def main(args: Array[String]) : Unit = {
    val list:ListBuffer[Int] = initListWith(args.first.toInt)
    println(&quot;Before: &quot;+list)
    println(&quot;After: &quot;+insertSort(list))

<p>private def insertSort(list:ListBuffer[Int]):ListBuffer[Int] = {
    (1 until list.length).foreach( (i:Int) =&gt; {
      rewindCompare(list, i)

<p>private def rewindCompare(list:ListBuffer[Int], idx:Int):Unit = {
    val value = list(idx)
    def swap(i: Int, j: Int) { val v = list(i); list(i) = list(j); list(j) = v }
    for (revIdx &lt;- idx until 0 by -1) {
      if (value &lt; list(revIdx-1)) swap(revIdx, revIdx-1)
      else return

<p>private def initListWith(limit:Int):ListBuffer[Int] = {
    val list:ListBuffer[Int] = new ListBuffer()
    val randGen = new Random()
    (0 until limit).foreach( i =&gt; list += randGen.nextInt(1000000) )
    return list

Compile and run as…

$ scalac
$ scala InsertionSort 20


Before: ListBuffer(548915, 963275, 872581, 284656, 52299, 261918, 222533, 288234, 340859, 546783, 428115, 659633, 450991, 693520, 15495, 108540, 402193, 48479, 701302, 269521)

After: ListBuffer(15495, 48479, 52299, 108540, 222533, 261918, 269521, 284656, 288234, 340859, 402193, 428115, 450991, 546783, 548915, 659633, 693520, 701302, 872581, 963275)

Your output will vary, as the initial list which is sorted is generated randomly.

Nov 09

Merge sort implementation and performance in Scala and Ruby

I’m not trying to turn this into language A vs. B debate, its just that something interesting happened last night. I’m trying to learn both Scala and Ruby. I’m a bit more enthused by Scala at this point, mostly because I somewhat prefer static typing and I’m a long time Java guy. Now, I am also a long time Perl guy, so I’m not necessarily choosing Scala over Ruby because of lack of experience with dynamic languages.

So, with this said, the best way to learn language idioms is to start actually implementing something. A sample application is good, but due to frameworks and relatively low business logic/algorithm ration to these sample apps, at least the ones I can conceive, you mostly spend time learning the framework idioms vs. actual languages ones.

So I thought I’d start by implementing some algorithms in these languages. I decided on Merge Sort. It’s a O(n log n) divide and conquer algorithm. See the link for more info.

Let’s start off with the code. Again, this is not done in an OO and/or reusable manner, this is mostly procedural/functional code to test mostly functional idioms.

Scala code:

  import java.math._
  import scala.util._
  import scala.collection.mutable._
  object MergeSort extends Application  {


    val list = initListWith(1000)

    val start = System.currentTimeMillis()
    println("Sorted " + merge_sort(list.toList).length + " elements.")
    val end = System.currentTimeMillis()
    println("Total time: " + (end - start))

    def merge_sort(list:List[Int]): List[Int] = {
      if (list.length <= 1) {return list}
      var (left: List[Int], right: List[Int]) = list splitAt Math.round(list.length/2.0).toInt
      left = merge_sort(left); right = merge_sort(right)
      if(left.last > right.head)
        merge(left, right);
        left ::: right

    def merge(l:List[Int], r:List[Int]):List[Int] = {
      var result:ListBuffer[Int] = new ListBuffer()
      var left = l; var right = r;
      while(left.length > 0 && right.length > 0) {
        if(left.head <= right.head) {
          result += left.head
          left = left.tail
        } else {
          result += right.head
          right = right.tail
      if(left.length > 0)
        result.toList ::: left
        result.toList ::: right

    def initListWith(limit:Int):List[Int] = {
      val list:ListBuffer[Int] = new ListBuffer()
      val randGen = new Random()
      (0 until limit).foreach( i => list += randGen.nextInt(1000000) )
      return list.toList


And Ruby code:

  def merge_sort(*list)
    return list if list.size <= 1

    result = []
    middle = (list.size/2.0).round
    left = list[0, middle]
    right =  list[middle..-1]

    left = merge_sort(*left)
    right = merge_sort(*right)

    if left[-1] > right[0]
      result = merge(left, right);
      result = left + right
    return result

  def merge(left, right)
    result =
    while left.size > 0 && right.size > 0
      if left.first <= right.first
        result << left.slice!(0)
        result << right.slice!(0)

    if left.size > 0
    return result

  list = (1..1000).collect {|i| rand(1000000); }

  puts "Starting..."
  start = (1000 *
  puts "Sorted " + merge_sort(*list).size.to_s + " elements."
  end_time = (1000 *
  total_time = end_time - start
  puts "Total time: #{total_time}"

So all works great, they both perform the operation correctly. But here is the weird part. JVM bytecode (Scala is very similar to Java) is supposed to be super fast. That’s the assumption I’ve always made on various readings as well as empirical data. Now, check out these benchmarks…

$ scalac merge_sort.scala && scala MergeSort Starting... Sorted 10000 elements. Total time: 374 $ ruby merge_sort.rb Starting... Sorted 10000 elements. Total time: 138

I’m not timing the compilation/startup phase, as you can see from the code. In the case of 10K records, ruby performs almost 3 times as fast. It the case of 100K records:

$ scalac merge_sort.scala && scala MergeSort Starting... Sorted 100000 elements. Total time: 31213 $ ruby merge_sort.rb Starting... Sorted 100000 elements. Total time: 7681

Ruby is 4 times faster than Scala in this case, of course the growth is relative to the (n log n) performance.

I’m not focusing on Ruby here, since Scala is what I’m concerned about. I used a List for most operations that didn’t require in place modifications. I used a ListBuffer for mutable lists, which advertises constant time append/prepend operations. I’m not sure what I’m missing. I’m going to play with it a bit more to see where the culprit is, but again, I’m not a Scala expert, though not very familiar with its collections library and the ins/outs of each implementation.

Any ideas?

** Update **

So after Maxim’s comment, I updated Scala to use

while(left.lengthCompare(0) > 0 && right.lengthCompare(0) > 0)


while(left.length > 0 && right.length > 0)

All I can say, Scala #$@%^ing rocks. So it seems like the length calculation on a mutable collection is the culprit, which makes perfect sense.

Using lengthCompare(l:Int):Int gets rid of the brute force length calculation. Here is what I get from the docs…

Result of comparing length with operand l. This method is used by matching streams against right-ignoring (…,_*) patterns.

I’m going to dig into the source to figure out how it does this comparison in a bit, but I can only guess that if say, we’re comparing it to length 0, all it needs is to get to the first element in the collection to figure out that the result is false, no need to actually get the full length.

Did I say culprit, well, let me say, BIG culrpit. Here are the benchmark results now…


$ scalac merge_sort.scala && scala MergeSort Starting... Sorted 100000 elements. Total time: 387


$ ruby merge_sort.rb Starting... Sorted 100000 elements. Total time: 7667

Wow, what a drastic improvement.

*** Disclaimer: This means nothing in terms of benchmarking! Please don’t waste your time disqualifying this benchmark, because I know it really means nothing in this small context as well as it means nothing without _power of a test measurement. Benchmarks are crap unless properly and fairly compared. This post is not about benchmarking languages, rather I wanted to see how the various idioms and libraries perform while learning both languages. It turned out to be very useful, as a small recommended tuning in Scala’s collections lib completely turned things inside out_ ***

Any ruby folks out there care to comment?