08
Sep 10

Agile Intifada

This blog post was inspired by a link my friend sent me titled “Agile Ruined My Life” as well as a conversation with my friends/co-workers and just wanting to clear up a few things about my previous “Agile Dilemma” post.

Some of this is taken verbatim from an email exchange between myself and a few friends I greatly respect.

I agree/like agile in theory and practice. I agree and like TDD in theory and practice. Anything that makes it mainstream can be scathed and unfairly criticized. So now that I have this out of the way, and I will clarify it later, the rest of the post won’t be so nice for some. (WARNING: It will sound a bit harsh to some ears, with the hope of being constructive.) With that I wish that if nothing else, the few that read this blog post will reflect upon it and critique constructively.

I wrote the Agile Dilemma post mostly while I was letting off steam about computer science itself becoming extinct in lieu of a bunch of bullshit consultants pushing agile and TDD 90% of the time without teaching people how to actually design and write good software. I think lots of failures in our industry are due to the fact that most folks don’t know shit about programming and computer science, they just know a language or few and eventually learn how to express some intention utilizing these languages. That’s like comparing a native speaker or someone who’s learned and practiced the art of a foreign language for years, with someone who’s learned how to ask for directions after listening to RosettaStone audios. From a business perspective most pointy-haired bosses don’t give a shit. It works and the business development teams can do their “magic”, but from a perspective of a computer scientist (the real software developer) it stinks. It smell of amateur manure (no matter how many tests your write to prove that the action button works).

Now, there are those that go even as far as saying that non-TDD written code is “stone age” or that programmers not practicing agile, TDD, pair-programming and all the other process bullshit are not “good” programmers or shouldn’t be programming. Well, then shut down your linux/unix operating systems, stop using emacs or most other editors, as a matter of fact, stop using 90% of the stuff that you’re currently using, because most of it was written by these “stone age” programmers who didn’t give a fuck about formalities of agile or TDD, but created masterpieces because they were smart, motivated, and knew how to program with common sense.

Before I get into detail, I’d love for people to stop petitioning for turning opinions formed by so called “experts”, into commandments. Just because Martin Fowler, Robert Martin, or name your own Agile Mythology Deity say it is so, doesn’t mean much more than anyone else in the field with experience. They are human just like me and you and form their own opinions just like me and you. The only thing they are better than the average at, is marketing. Yes, they are sales folks with large investments in agile, TDD, etc… in their consulting business, so I don’t think I go overboard by saying that they are a bit biased. Spreading FUD brings them more business. As one of the above links pointed out, Peter Norvig, Linus Torvalds, and hundreds of other brilliant programmers (far more brilliant individually than Fowler and Martin combined), have been programming successfully for decades not using any formal methodologies and techniques and not following TDD and have succeeded beyond their wildest dreams. I’ll take one Norvig over 50 Fowlers any day.

Writing tests is not innovative, it’s been around for decades. It was just never formalized. Yes, there weren’t as many tests written or sometimes none, programmers actually got to decide whether a test was needed or not. Programmers are sometimes overly optimistic, so yes, mistakes were made, code had bugs, stuff was rewritten. Decades later, TDD and Agile at hand, mistakes are made, code still has bugs, stuff still gets rewritten. Now put that on your t-shirt along with your favorite TDD blurb.

So I had an interview about two years ago, where I was asked two questions that after reading this post, folks should immediately take off their interview questionnaire if they ever want to hire someone good. The first question, wasn’t as bad, but it wanted me to recite some design patterns. I’m all too well familiar with those, probably more than I want to as this point, but familiarity with patterns doesn’t in any way exclude anyone from the “good” programmer group. The second question was the worst, they wanted to know how many lines of test code I write per week. I had to pause and then ask them to ask again, as I was puzzled. WTF does that mean? I don’t know how many lines of non-test code I write per week, you want me to count/average the amount of tests I write? I mean, I see if the question was if I practice TDD, but lines of fucking test code? Either way, just to clarify that I’m not bitter and that’s why I’m mentioning this, the interview went very well, we just couldn’t come to terms on salary.

Agile is also not innovative. I’d like to think of agile as an abstract set of empirical ideas (patterns), with implementations left to the people/companies. Lately though, because there is really not much more one can write about the few agile principles, most literature about agile is about concrete agile practices through author’s experiences. These are all great reads only if people would read and take them for what they are “experiences”. We should learn from experiences, not try to recreate them. Agile approach is common sense that has been practiced for decades in different circles. Iterations are common sense, they were practiced for as long as programming existed, tests are also common sense, actually most of agile is just good common sense approaches to building products. Formalizing it actually helped quite a bit, as the industry was able to reason about it and transfer the empirical knowledge to others with less experience. But then, as with anything mainstream, the bureaucrats took over, and it’s IMO been down the hill ever since. Actually, some recent attempts at quantifying agile’s success have failed to show it to be any better (within a margin of error) than any other process or no process at all. That’s not to say that it’s not successful, it’s been very successful for many, including me and the companies I worked with. The failures that folks are seeing are due to many reasons, but partly because in most companies agile is practiced as a bureaucratic rule of thumb, without any common sense. Folks are forced to write comprehensive test suites. The “comprehensive” is something inferred and quantified by quality control tools, that apply bullshit heuristics. But managers love it, it gives them numbers and pie charts. So programmers (even the ones that love programming as more than just a day job), do what ever it takes to keep the job, they write comprehensive tests and bullshit software. At the end of the day, who gives a crap if your algorithm is exponential complexity and mine is logarithmic, the tests pass and the sales can go on. But what about folks that actually love what they do (computer science)?

So let’s get back to basics. Learn computer science, algorithms, data structures, language theory and practice. You’re never done learning. Go out and create your own masterpieces. Don’t let the agile Deities full you into thinking that your software isn’t worthy if you didn’t pass a TDD heuristic or if you don’t hold daily standup meetings. No one knows you and your team better than you. DO WHAT MAKES SENSE FOR YOU AND YOUR TEAM!

DISCLAIMER: No agile enthusiasts were harmed in this experiment, including myself.


05
Aug 10

Powerful multi-method dispatch with Lisp

I’ve been doing quite a bit of common Lisp lately. I really like it. It’s very powerful and has a combination of simple syntax with quite powerful abstractions abilities. Abstractions comes in different flavors in Lisp (functional, OO, etc…), but the most powerful one is the macro system. I won’t touch more on that in the post, as I’m quite a beginner and probably don’t even grasp its full power and potential.

Unlike many languages which define the abstractions you can use, Lisp gives you all of them and also allows you to build most abstractions you want that are not a part of the language. Some of these powerful features are also present in other languages, as many languages that came after, took notice of some features, but the combination of all of these and the ability to build your own core language abstractions is what makes Lisp probably the most powerful language out there.

Multimethods is the ability to do runtime method dispatch based on arguments and their specialization. Multimethods are not supported in languages like Java with single dispatch, but some other languages support it either through libraries or similar facilities (i.e. pattern matching). Although pattern matching is different, in some context is can provide similar levels of abstraction/flexibility.

Here is an example of multimethods from my cl-kyoto-cabinet project, where I found quite a use for them.

Update: Thanks to Drewc for pointing out that my original example was single dispatch. I was trying to demonstrate the conciseness of polymorphic method definitions inline with defgeneric and completely overlooked the fact that the method dispatch was only occurring on one specialization. The original version is here. Below is the updated version…

(defclass kc-dbm ()
  ()
  (:documentation "A KC database."))
  
(defclass kc-bdb (kc-dbm)
  ()
  (:documentation "A KC B+ tree database."))

(defclass kc-hdb (kc-dbm)
  ()
  (:documentation "A KC hash database."))

(defgeneric put (db key value &key mode))

(defmethod put ((db kc-bdb) key value &key mode)
  (funcall (put-method-for db mode) key value))

(defmethod put ((db kc-hdb) key value &key mode)
  (funcall (put-method-for db mode) key value))


(defgeneric put-method-for (mode)
  (:method ((db kc-bdb) (mode (eql :replace))) #'bdbset)
  (:method ((db kc-bdb) (mode (eql :keep))) #'bdbadd)
  (:method ((db-kc-bdb) (mode (eql :concat))) #'bdbappend)
  
  (:method ((db kc-hdb) (mode (eql :replace))) #'hdbset)
  (:method ((db kc-hdb) (mode (eql :keep))) #'hdbadd)
  (:method ((db-kc-hdb) (mode (eql :concat))) #'hdbappend))
  

;; definitions of methods for kcdbset, kcdbadd, and kcdbappend define methods
;; that store key/value pairs with different behaviors based on whether
;; that key already exists.
;; For more info see: http://github.com/isterin/cl-kyoto-cabinet
  
;; Usage...

(defparameter *kchdb* (make-instance 'kc-hdb))
(defparameter *kcbdb* (make-instance 'kc-bdb))

;; Will store some_val under some_key
(put *kchdb* "some_key" "some_val" :mode :replace)
(put *kchdb* "some_key" "some_val" :mode :keep)

(put *kcbdb* "some_key" "some_val" :mode :replace)
(put *kcbdb* "some_key" "some_val" :mode :concat)

The multimethods dispatch is very flexible and concise for accomplishing method-level polymorphism. Same can be accomplished in a less flexible OO language like Java using interfaces and the Strategy pattern, but that usually ends up being more verbose and ceremonious in most non-rudimentary scenarios.

Scala has a similar ability through pattern matching, and so does Erlang.


21
Jun 10

Lazy cheap flight calculations with priority queues

There is an interesting problem of utilizing priority queues to figure out the best price combination in a set of flight legs. The problem is as follows:

We need to calculate the cheapest combination of flight legs (connections) for a flight to a particular destination. We’re given a price ordered N set of flight legs and we need to find the winning combination. Each combination would be evaluate for eligibility and would either pass or fail, so the cheapest combination doesn’t necessarily reflect the cheapest possibly combination of prices from the legs. A black box predicate function is consulted to ensure the combination is eligible. This reflects various airline rules, like overlapping times, specials that are only available to certain people, routes, or connections.

Solution: A naive approach for say a two leg flight is to say construct a (n x m) ordered matrix and evaluate each priced ordered combination through the black box predicate routing until one passes. The problem with this approach is that we unendingly construct a full matrix when in many cases one of the first combinations is enough to present the cheapest “valid” price. The key to reducing this is to construct a lazy data structure which will prioritize the cheapest flights and can then be iterated to find one that’s valid. We do so at runtime while constructing matrix combinations. The solutions is generalized, so the same can be used for n leg flights.

The algorithm goes something like this…

Construct the first set of combinations which can reflect the cheapest flight. The first cheapest combinations is always n1 + m1. If that doesn’t pass, the next possible set of cheapest combinations is either n2 + m1 or n1 + m2. We then continue to n1 + ma and na + m1, where a is incremented until the end of the route leg set for either leg.

The worst case running time is quadratic O(n2), but because of the lazy data structure, the algorithm runs in rather constant time, depending on how lucky we are that the first few combinations will yield a “rule valid” price combination.

This problem idea came from reading The Algorithm Design Manual by Steven S. Skiena. I recommend this book for anyone wishing to delve into the world of more advanced algorithm design.

Here is the solution in python. You’ve probably noticed I’ve been using a lot of python. Besides the fact that I like the language, python is an incredibly good language for conveying algorithmic ideas in a concise but very readable way.

The only two functions that matter, are cheapest_price and _pick_combo, the rest are just auxiliary functions used to support an OO structure and running a sample.

  import heapq, random, time

  class Route(object):
      """docstring for TicketFinder"""
      def __init__(self):
          self.heap = []
          self.unique = dict()
          self.legs = []
          self.max_leg_len = 0
          self._counter = 0
          self._loop_counter = 0

      def add_leg(self, leg):
          leg.sort()
          self.legs.append(leg)
          leg_len = len(leg)
          if leg_len > self.max_leg_len:
              self.max_leg_len = leg_len

      def cheapest_price(self, pred_func=lambda x: True):
          for i in range(0, self.max_leg_len):
              combo = self._pick_combo(i, pred_func)
              if combo: return combo

      def print_stats(self):
          print("""Legs: %s
  Combos examined: %s
  Loops: %s
  """ % (len(self.legs), self._counter, self._loop_counter))

      def _pick_combo(self, curr_idx, pred_func):
          num_legs = len(self.legs)
          price_combo = [ leg[curr_idx] for leg in self.legs if not curr_idx >= len(leg) ]
          self._add_combo(price_combo)
          cheapest_price = self._eval_price_combo(pred_func)
          if cheapest_price: return cheapest_price
          for idx in range(1, self.max_leg_len-curr_idx):
              for j in range(0, num_legs):
                  if len(self.legs[j]) &lt= (curr_idx+idx): continue
                  combo = []
                  for k in range(0, num_legs):
                      self._loop_counter += 1
                      if j == k:
                          combo.append(self.legs[k][curr_idx+idx])
                      elif curr_idx &lt len(self.legs[k]):
                          combo.append(self.legs[k][curr_idx])
                  self._add_combo(combo)

              cheapest_price = self._eval_price_combo(pred_func)
              if cheapest_price: return cheapest_price

      def _add_combo(self, combo):
          self._counter += 1
          if len(combo) == len(self.legs) and not self.unique.has_key(str(combo)):
              heapq.heappush(self.heap, combo)
              self.unique[str(combo)] = True

      def _eval_price_combo(self, pred_func):
          for i in range(0, len(self.heap)):
              least_combo = heapq.heappop(self.heap)
              if pred_func(least_combo):
                  print("Winning combo: %s" % [ "%.2f" % l for l in least_combo ])
                  return sum(least_combo)
          return None


  ############### Samples below ##################

  def sample_run(num_legs, pred_func):
      print(("#" * 30) + " Sample Run " + ("#" * 30))
      route = Route()
      for i in range(0, num_legs):
          route.add_leg( [ random.uniform(100, 500) for i in range(0, 100) ] )

      start = time.clock()
      price = route.cheapest_price(pred_func)
      calc_time = time.clock() - start

      if price:
          print("Cheapest price: %.2f" % price)
      else:
          print("No valid route found")
      route.print_stats()
      print(("#" * 72) + "\n")

  if __name__ == '__main__':
      sample_run(2, lambda x: True)
      def pred(x):
          for price in x:
              if price &lt 150: return False
          return True
      sample_run(3, pred)

I haven’t thoroughly tested this for correctness besides numerous runs and some basic validation so let me know if you see anything apparently wrong here.

Running the above yields

    ############################## Sample Run ##############################
    Winning combo: ['103.62', '106.40']
    Cheapest price: 210.03
    Legs: 2
    Combos examined: 1
    Loops: 0

    ########################################################################

    ############################## Sample Run ##############################
    Winning combo: ['150.74', '150.25', '173.95']
    Cheapest price: 474.95
    Legs: 3
    Combos examined: 2852
    Loops: 8523

    ########################################################################

For the first sample run, we use a predicate function which yields True, so we never examine anything other than the first combo n1 + m1. For the second sample, I add a predicate function which only accepts any price combination where all legs are above $150. (Of course this is not anything resembling airline rules, just good enough to simulate some sample cases, where the first n combinations are rejected). In the second sample run, we utilized 3 legs and examined 2852 combinations before coming up with the winning leg combination for the route. Each price within the combination is the smallest possible price above $150 for each leg.


27
May 10

Random points in polygon generation algorithm

I needed to generate a set of random points within a polygon, including convex and concave. The need arouse in a geospatial domain where polygons are rather small (on a geo-scale) and wouldn’t span more than say 10 miles, though the benefit of employing more complex algorithms to deal with spheroid properties are negligible. Plane geometry provided enough to meet this requirement. Point-in-Polygon tests are rather simple and are used to test whether a point exists in a polygon. The test is performed using a Ray casting algorithm which test the intersections of a ray across the x-axis starting from the point in question.

Another concept is the Minimum Bounding Rectangle (Bounding Box), which is the minimal rectangle needed to enclose a geographical object (i.e. polygon).

So, one can generate random points within a polygon by…

  1. Generating a bounding box
  2. Generating a point within the bounding box. This is a simple algorithm.
  3. Using Point-in-Polygon to test whether this point exists within the polygon.

Because of the random sampling nature and false positives from step 2, which must be tested in step 3, the above must be performed in a loop until the Point-in-Polygon test passes.

This works quite well for generating test data, as there are no tight bounds on the performance characteristics of random generation. One could also use the above algorithm in production as long as the ration of polygon to bounding box is rather large, which is usually the case for convex polygons. The ratio might be too small convex polygons, though causing a more than acceptable number of false positives in step #2.

I’ve implemented this in the geo-utils python package and made available on github. Feel free to use and provide any feedback.

To utilize the geo-utils to generate random points within a polygon, you would do the following:

  from vtown import geo
  from vtown.geo.polygon import Polygon


  polygon = Polygon(  geo.LatLon(42.39321,-82.92114),
                      geo.LatLon(42.39194,-82.91669),
                      geo.LatLon(42.39147,-82.91796),
                      geo.LatLon(42.39090,-82.91974),
                      geo.LatLon(42.39321,-82.92114))

  point = polygon.random_point()

The above polygon is generated using lat/lon coordinates, but you can generate them using simple x/y coordinates with geo.Point(x,y)

Here are some code snippets from the implementation. I only pasted the relevant parts. For boilerplate and relevant data structures, see the geo-utils package.

class BoundingBox(object):

    def __init__(self, *points):
        """docstring for __init__"""
        xmin = ymin = float('inf')
        xmax = ymax = float('-inf')
        for p in points:
            if p.x < xmin: xmin = p.x
            if p.y < ymin: ymin = p.y
            if p.x > xmax: xmax = p.x
            if p.y > ymax: ymax = p.y
        self.interval_x = Interval(xmin, xmax)
        self.interval_y = Interval(ymin, ymax)

    def random_point(self):
        x = self.interval_x.random_point()
        y = self.interval_y.random_point()
        return Point(x, y)

class Polygon:
  ## __init__ omitted here...

  def contains(self, point):
        seg_counter = private.SegmentCounter(point)
        for i in range(1, len(self.points)):
            line = Line(*self.points[i-1:i+1])
            if seg_counter.process_segment(line):
                return True
        return seg_counter.crossings % 2 == 1

  def random_point(self):
        bb = BoundingBox(*self.points)
        while True:
            print("GENERATING RANDOM POINT...")
            p = bb.random_point()
            if self.contains(p):
                return p

class SegmentCounter(object):

    def __init__(self, point):
        self.point = point
        self.crossings = 0

    def process_segment(self, line):
        p, p1, p2 = self.point, line.point1, line.point2
        if p1.x < p.x and p2.x < p.x:
            return False

        if (p.x == p2.x and p.y == p2.y):
            return True

        if p1.y == p.y and p2.y == p.y:
            minx = p1.x
            maxx = p2.x
            if minx > maxx:
                minx = p2.x
                maxx = p1.x
            if p.x >= minx and p.x <= maxx:
                return True
            return False


        if ((p1.y > p.y) and (p2.y <= p.y)) \
                or ((p2.y > p.y) and (p1.y <= p.y)):
            x1 = p1.x - p.x
            y1 = p1.y - p.y
            x2 = p2.x - p.x
            y2 = p2.y - p.y

            det = numpy.linalg.det([[x1, y1], [x2, y2]])
            if det == 0.0:
                return True
            if y2 < y1:
                det = -det

            if det > 0.0:
                self.crossings += 1

17
May 10

Divide and conquer for exponentiation

Here is an awesome way to demonstrate divide and conquer algorithm performing exponentiation. Naive exponentiation algorithms xn would perform n-1 multiplications as n x n … x n-1. This has an algorithmic complexity of O(n) which of course scales poorly for any significantly large number. This is not even including the overhead of performing integer multiplication beyond CPUs capacity is slower than staying within the CPU integer range. Now, do that n times and you have a problem.

Logarithmic performance O(log n) is one of the best common algorithmic complexities there is (outside of constant complexity of course, which is rare). One can achieve calculating power by utilizing the power of logarithms, which are clearly apparent in divide and conquer problem solutions.

Logarithms grow very slow compared to number of inputs, though for a calculating a power of say n1000000, with the naive algorithm, you’d have to perform 999,999 multiplications. With a logarithmic complexity algorithm this drops to log21000000 = ceil(19.93) = 20 steps. 20 steps with a few extra operations for step compared to 1million multiplications.

Here is an example of both exponentiation algorithms, the logarithmic complexity and linear complexity (called naive), as well as built in python pow() function. Both our logarithmic power function and python’s built in one perform the same, where the naive linear function starts to truly deteriorate once any reasonable number is used as the exponent.

_Note: this function is recursive though you can run out of stack space for very large exponents (you can also easily reimplement it as recursion). On a system with a 1024 stack limit, this would mean your exponent would have to be above 21024 or

17976931348623159077293051907890247336179769789423065727343008 11577326758055009631327084773224075360211201138798713933576587 89768814416622492847430639474124377767893424865485276302219601 24609411945308295208500576883815068234246288147391311054082723 7163350510684586298239947245938479716304835356329624224137216

before you run out of stack space._

Here is a benchmarked python implementation. The relevant algorithm part is highlighted.

#!/usr/bin/env python
import math
import time
import sys

def power(b, e):
    """logarithmic divide/conquer algorithm"""
    if e == 0: return 1
    x = power(b, math.floor(e/2))
    if e % 2 == 0: return pow(x, 2)
    else: return b * pow(x, 2)

def naive_power(b, e):
    """linear power algorithm"""
    x = b;
    for i in range(1, e):
        x *= b
    return x

def perform(name, base, exp, pfunc):
    print("%s: %d^%d: %d" % (name, base, exp, pfunc(base, exp)))

if __name__ == '__main__':
    if len(sys.argv) != 3:
        sys.exit("You must provide a base and an exponent.  (Usage: exp.py base exp)")
    base = int(sys.argv[1])
    exp = int(sys.argv[2])
    for func in (power, naive_power, pow):
        print("Benchmarking %s..." % func.__name__)
        bench = []
        for i in range(0,5):
            start = time.time()
            ans = func(base, exp)
            end = time.time()
            bench.append(end-start)
        print("\tCalculated in: %s" % min(bench))
]]>

Running above to calculate 2200000

$ python exp.py 2 200000
Benchmarking power…
    Calculated in: 0.0042099952697753906
Benchmarking naive_power…
    Calculated in: 6.078423023223877
Benchmarking pow…
    Calculated in: 0.0041148662567138672

Hmmm, both pow() (python’s built in power) and power() (logarithmic complexity) calculated the power in 4 millis (above is in seconds) and our naive_power() function calculates the same result in 6 seconds.

I tried running the script to calculate 21000000, which calculated using logarithmic functions in 25 milliseconds and I killed the naive_power() calculation after a few minutes of impatiently waiting for it to complete.

Power to the logarithms!!! :-)

Page 3 of 1212345...10...Last »