May, 2010

May 10

Random points in polygon generation algorithm

I needed to generate a set of random points within a polygon, including convex and concave. The need arouse in a geospatial domain where polygons are rather small (on a geo-scale) and wouldn’t span more than say 10 miles, though the benefit of employing more complex algorithms to deal with spheroid properties are negligible. Plane geometry provided enough to meet this requirement. Point-in-Polygon tests are rather simple and are used to test whether a point exists in a polygon. The test is performed using a Ray casting algorithm which test the intersections of a ray across the x-axis starting from the point in question.

Another concept is the Minimum Bounding Rectangle (Bounding Box), which is the minimal rectangle needed to enclose a geographical object (i.e. polygon).

So, one can generate random points within a polygon by…

  1. Generating a bounding box
  2. Generating a point within the bounding box. This is a simple algorithm.
  3. Using Point-in-Polygon to test whether this point exists within the polygon.

Because of the random sampling nature and false positives from step 2, which must be tested in step 3, the above must be performed in a loop until the Point-in-Polygon test passes.

This works quite well for generating test data, as there are no tight bounds on the performance characteristics of random generation. One could also use the above algorithm in production as long as the ration of polygon to bounding box is rather large, which is usually the case for convex polygons. The ratio might be too small convex polygons, though causing a more than acceptable number of false positives in step #2.

I’ve implemented this in the geo-utils python package and made available on github. Feel free to use and provide any feedback.

To utilize the geo-utils to generate random points within a polygon, you would do the following:

  from vtown import geo
  from vtown.geo.polygon import Polygon

  polygon = Polygon(  geo.LatLon(42.39321,-82.92114),

  point = polygon.random_point()

The above polygon is generated using lat/lon coordinates, but you can generate them using simple x/y coordinates with geo.Point(x,y)

Here are some code snippets from the implementation. I only pasted the relevant parts. For boilerplate and relevant data structures, see the geo-utils package.

class BoundingBox(object):

    def __init__(self, *points):
        """docstring for __init__"""
        xmin = ymin = float('inf')
        xmax = ymax = float('-inf')
        for p in points:
            if p.x < xmin: xmin = p.x
            if p.y < ymin: ymin = p.y
            if p.x > xmax: xmax = p.x
            if p.y > ymax: ymax = p.y
        self.interval_x = Interval(xmin, xmax)
        self.interval_y = Interval(ymin, ymax)

    def random_point(self):
        x = self.interval_x.random_point()
        y = self.interval_y.random_point()
        return Point(x, y)

class Polygon:
  ## __init__ omitted here...

  def contains(self, point):
        seg_counter = private.SegmentCounter(point)
        for i in range(1, len(self.points)):
            line = Line(*self.points[i-1:i+1])
            if seg_counter.process_segment(line):
                return True
        return seg_counter.crossings % 2 == 1

  def random_point(self):
        bb = BoundingBox(*self.points)
        while True:
            print("GENERATING RANDOM POINT...")
            p = bb.random_point()
            if self.contains(p):
                return p

class SegmentCounter(object):

    def __init__(self, point):
        self.point = point
        self.crossings = 0

    def process_segment(self, line):
        p, p1, p2 = self.point, line.point1, line.point2
        if p1.x < p.x and p2.x < p.x:
            return False

        if (p.x == p2.x and p.y == p2.y):
            return True

        if p1.y == p.y and p2.y == p.y:
            minx = p1.x
            maxx = p2.x
            if minx > maxx:
                minx = p2.x
                maxx = p1.x
            if p.x >= minx and p.x <= maxx:
                return True
            return False

        if ((p1.y > p.y) and (p2.y <= p.y)) \
                or ((p2.y > p.y) and (p1.y <= p.y)):
            x1 = p1.x - p.x
            y1 = p1.y - p.y
            x2 = p2.x - p.x
            y2 = p2.y - p.y

            det = numpy.linalg.det([[x1, y1], [x2, y2]])
            if det == 0.0:
                return True
            if y2 < y1:
                det = -det

            if det > 0.0:
                self.crossings += 1

May 10

Divide and conquer for exponentiation

Here is an awesome way to demonstrate divide and conquer algorithm performing exponentiation. Naive exponentiation algorithms xn would perform n-1 multiplications as n x n … x n-1. This has an algorithmic complexity of O(n) which of course scales poorly for any significantly large number. This is not even including the overhead of performing integer multiplication beyond CPUs capacity is slower than staying within the CPU integer range. Now, do that n times and you have a problem.

Logarithmic performance O(log n) is one of the best common algorithmic complexities there is (outside of constant complexity of course, which is rare). One can achieve calculating power by utilizing the power of logarithms, which are clearly apparent in divide and conquer problem solutions.

Logarithms grow very slow compared to number of inputs, though for a calculating a power of say n1000000, with the naive algorithm, you’d have to perform 999,999 multiplications. With a logarithmic complexity algorithm this drops to log21000000 = ceil(19.93) = 20 steps. 20 steps with a few extra operations for step compared to 1million multiplications.

Here is an example of both exponentiation algorithms, the logarithmic complexity and linear complexity (called naive), as well as built in python pow() function. Both our logarithmic power function and python’s built in one perform the same, where the naive linear function starts to truly deteriorate once any reasonable number is used as the exponent.

_Note: this function is recursive though you can run out of stack space for very large exponents (you can also easily reimplement it as recursion). On a system with a 1024 stack limit, this would mean your exponent would have to be above 21024 or

17976931348623159077293051907890247336179769789423065727343008 11577326758055009631327084773224075360211201138798713933576587 89768814416622492847430639474124377767893424865485276302219601 24609411945308295208500576883815068234246288147391311054082723 7163350510684586298239947245938479716304835356329624224137216

before you run out of stack space._

Here is a benchmarked python implementation. The relevant algorithm part is highlighted.

#!/usr/bin/env python
import math
import time
import sys

def power(b, e):
    """logarithmic divide/conquer algorithm"""
    if e == 0: return 1
    x = power(b, math.floor(e/2))
    if e % 2 == 0: return pow(x, 2)
    else: return b * pow(x, 2)

def naive_power(b, e):
    """linear power algorithm"""
    x = b;
    for i in range(1, e):
        x *= b
    return x

def perform(name, base, exp, pfunc):
    print("%s: %d^%d: %d" % (name, base, exp, pfunc(base, exp)))

if __name__ == '__main__':
    if len(sys.argv) != 3:
        sys.exit("You must provide a base and an exponent.  (Usage: base exp)")
    base = int(sys.argv[1])
    exp = int(sys.argv[2])
    for func in (power, naive_power, pow):
        print("Benchmarking %s..." % func.__name__)
        bench = []
        for i in range(0,5):
            start = time.time()
            ans = func(base, exp)
            end = time.time()
        print("\tCalculated in: %s" % min(bench))

Running above to calculate 2200000

$ python 2 200000
Benchmarking power…
    Calculated in: 0.0042099952697753906
Benchmarking naive_power…
    Calculated in: 6.078423023223877
Benchmarking pow…
    Calculated in: 0.0041148662567138672

Hmmm, both pow() (python’s built in power) and power() (logarithmic complexity) calculated the power in 4 millis (above is in seconds) and our naive_power() function calculates the same result in 6 seconds.

I tried running the script to calculate 21000000, which calculated using logarithmic functions in 25 milliseconds and I killed the naive_power() calculation after a few minutes of impatiently waiting for it to complete.

Power to the logarithms!!! 🙂