17
May 10

Divide and conquer for exponentiation

Here is an awesome way to demonstrate divide and conquer algorithm performing exponentiation. Naive exponentiation algorithms xn would perform n-1 multiplications as n x n … x n-1. This has an algorithmic complexity of O(n) which of course scales poorly for any significantly large number. This is not even including the overhead of performing integer multiplication beyond CPUs capacity is slower than staying within the CPU integer range. Now, do that n times and you have a problem.

Logarithmic performance O(log n) is one of the best common algorithmic complexities there is (outside of constant complexity of course, which is rare). One can achieve calculating power by utilizing the power of logarithms, which are clearly apparent in divide and conquer problem solutions.

Logarithms grow very slow compared to number of inputs, though for a calculating a power of say n1000000, with the naive algorithm, you’d have to perform 999,999 multiplications. With a logarithmic complexity algorithm this drops to log21000000 = ceil(19.93) = 20 steps. 20 steps with a few extra operations for step compared to 1million multiplications.

Here is an example of both exponentiation algorithms, the logarithmic complexity and linear complexity (called naive), as well as built in python pow() function. Both our logarithmic power function and python’s built in one perform the same, where the naive linear function starts to truly deteriorate once any reasonable number is used as the exponent.

_Note: this function is recursive though you can run out of stack space for very large exponents (you can also easily reimplement it as recursion). On a system with a 1024 stack limit, this would mean your exponent would have to be above 21024 or

17976931348623159077293051907890247336179769789423065727343008 11577326758055009631327084773224075360211201138798713933576587 89768814416622492847430639474124377767893424865485276302219601 24609411945308295208500576883815068234246288147391311054082723 7163350510684586298239947245938479716304835356329624224137216

before you run out of stack space._

Here is a benchmarked python implementation. The relevant algorithm part is highlighted.

#!/usr/bin/env python
import math
import time
import sys

def power(b, e):
    """logarithmic divide/conquer algorithm"""
    if e == 0: return 1
    x = power(b, math.floor(e/2))
    if e % 2 == 0: return pow(x, 2)
    else: return b * pow(x, 2)

def naive_power(b, e):
    """linear power algorithm"""
    x = b;
    for i in range(1, e):
        x *= b
    return x

def perform(name, base, exp, pfunc):
    print("%s: %d^%d: %d" % (name, base, exp, pfunc(base, exp)))

if __name__ == '__main__':
    if len(sys.argv) != 3:
        sys.exit("You must provide a base and an exponent.  (Usage: exp.py base exp)")
    base = int(sys.argv[1])
    exp = int(sys.argv[2])
    for func in (power, naive_power, pow):
        print("Benchmarking %s..." % func.__name__)
        bench = []
        for i in range(0,5):
            start = time.time()
            ans = func(base, exp)
            end = time.time()
            bench.append(end-start)
        print("\tCalculated in: %s" % min(bench))
]]>

Running above to calculate 2200000

$ python exp.py 2 200000
Benchmarking power…
    Calculated in: 0.0042099952697753906
Benchmarking naive_power…
    Calculated in: 6.078423023223877
Benchmarking pow…
    Calculated in: 0.0041148662567138672

Hmmm, both pow() (python’s built in power) and power() (logarithmic complexity) calculated the power in 4 millis (above is in seconds) and our naive_power() function calculates the same result in 6 seconds.

I tried running the script to calculate 21000000, which calculated using logarithmic functions in 25 milliseconds and I killed the naive_power() calculation after a few minutes of impatiently waiting for it to complete.

Power to the logarithms!!! 🙂


12
Apr 10

Python and mod_wsgi on Mac OS X Snow Leopard (oy vey)

I’ve been dabbling with Python Turbogears in the last week. Turbogears is a great framework so far. My biggest like is the loose coupling, allowing you to choose the best component for the job. I’ll write a more detailed blog about the other RAD web development frameworks out there sometime this week, but my biggest dislike of say Rails, Django, and some others, is how they tie you into using their integrated components. You can of course hackishly disable their use/dependency, but not without losing many other features of the framework. In my experience, these type of frameworks are great at getting something up quickly, but they suck when it comes to long term scalability and growth, as you basically end up rewriting the framework to integrate other 3rd party or your own components to the point that it marginalizes the benefits of the framework.

I found out that building Python on a Mac (OS X Snow Leopard) is nearly impossible. You can definitely compile it, but I needed it compiled to work with mod_wsgi, and various modules. I also needed to compile mod_wsgi against a particular version of apache, which required me to compile python as a universal binary to support i386 and x86_64 architectures. That’s where it started to get painful. Snow Leopard is distributed with Python 2.6.1 and supports a 3-way architecture (ppc7400, i386, and x86_64), but the latest python release is 2.6.5, so I tried to compile it. I used a myriad of options…

./configure --prefix=/usr/local/python-2.6.5 \
--enable-framework=/usr/local/python-2.6.5/frameworks \
MACOSX_DEPLOYMENT_TARGET=10.6 \
--enable-universalsdk=/Developer/SDKs/MacOSX10.6.sdk \
--with-universal-archs=3-way

I also tried to use intel for –with-universal-archs. Both CCFLAGS and LDFLAGS were being set correctly in the Makefile. I even tried setting them explicitly, with no luck: Python’s executable was compiling in all architectures except 64-bit. I wasn’t able to find any references to any such issue anywhere in the user forums and/or on the web. Every reference I saw in compiling python in 64-bit, I tried, with no luck. Evidentally, the Python distributed with Snow Leopard was compiled using some special compile procedures by Apple due to the fact that some packages lack 64-bit compatibility. I couldn’t find any reference to this procedure, nor did I really want to engage is such activity. Come on Python folks, WTF??? Can you either provide compilation instructions, or distribute MacPython as a universal binary including x86_64?

I ended up resorting (unhappily) to using Snow Leopard’s distributed Python and Mac’s web-sharing apache. I compiled mod_wsgi with:

./configure --with-apxs=/usr/sbin/apxs \
--with-python=/System/Library/Frameworks/Python.framework/Versions/2.6/bin/python

and voila, we have a working mod_wsgi install. I really dislike the fact that I wasted days trying to get it to work with Python 2.6.5 and custom apache install, but at least I have something that works now and hopefully won’t slow me down any more.

I’m loving python, as I have sporadically for quite a while, but I’m really missing the JVM with its problem-free installs, jar/war archives, and things just working.


02
Apr 10

Agile dilemma

The time has come to truly retire the word agile. I like the word. I love the idea of agility. But I’m starting to dislike how it’s starting to constrain software development and how it’s endlessly used in software literature these days for the lack of anything else to talk about I guess.

What was supposed to be as phrased in the agile manifesto:

Individuals and interactions over processes and tools

has become infested with these non-developer type parasites who at any sign of anything having a chance of making mainstream, inject their bureaucratic venom into it forcing a self preservation act by most smart developers. That act is to run for the hills.

So what happens next? Well, its a never ending lifecycle. Agile will reincarnate as something very similar to where its origins once lie, but in smaller development circles, until it’s crippled again. This cycle will repeat indefinitely. This cycle is not only apparent in software development lifecycle methodologies, but also in programming languages, frameworks, etc…

Software development is as much art as it is science and engineering. I think when some hear engineering, they think bridge engineering or what today has almost become a ubiquitous term for anyone doing anything. In engineering a bridge, one can reproduce rather predictable set of results. In developing software, one can’t. Engineering is as American Engineers’ Council for Professional Development put it: The creative application of scientific principles to design… Creative is the keyword here. Not constrained by process the kills creativity. We as professionals should be able to discerns these things and stop applying absolutes to the creative process. Just because we have successfully applied processes in the manufacturing discipline, doesn’t mean that same processes or processes at all, can be applied to software engineering. Sorry folks, no software factories for you.

Useful precise estimation is a delusion, and the sooner we realize it, the faster we can get back to developing quality software. Process is disadvantageous. All designed to give the businesses a warm fuzzy feeling of certainty around something that’s not certain. So why are there some that can give good estimates? There are only two reasons. One is that they are lucky. No really, they are lucky at predicting the unpredictable. This happens, trust me. Just ask someone who has a financial advisor and trusts them to set up their financial investment portfolio. Do you trust your financial advisor? If you do, please read Fooled by Randomness and The Black Swan. Also, there is actually one other reason for good estimation. Sorry developers for exposing this to your managers. This reason is overestimation. Yes, you heard it right. You overestimate, then you finish your task early, but instead of looking good by marking it as done early, which will in turn screw up estimation reports at the end of the sprint by showing that you chronically overestimate, you choose not to and though find something else to do to kill time or start on another task in case that one turns out to be underestimated. At the end of the project, you have accumulated so much time, that your friends have to ignore you on facebook and un-follow you on twitter.

One might say it doesn’t matter, as long as the overall estimates are close. Well, whatever makes you sleep better at night. Just know, software estimation is not science, it’s gambling.

Ah, one other thing, mostly related to XP. My favorite topic, Pair Programming. Sorry folks, this one’s not really for me. There are many issues with it. First, it’s disruptive in the way I think and to my personal creative process (I’m sure it’s not just me either). When I think, I don’t want anyone interrupting my train of thought just because they have an idea. Great, but if your idea doesn’t pan out, my idea just got lost in the shuffle, thanks for the context switch partner. In any other field that requires critical thinking, people work alone a lot (maybe not all the time). Just ask all the researchers. They come up with their best ideas/research and when ready, share those with colleagues or other researchers. They don’t gather up in a room and participate in committee based research. That would never work. One might say it’s because they are egotists. That might be true, but that’s not necessarily a bad quality being that it drives them to outperform someone else. At the end of the day, most innovations come from these weird acting egotists.

In pair programming, we have to be on the same schedule. When my brain doesn’t click, I like to go for a walk or pace around the room to get me into the creative process again. A lot of my ideas come not when I’m sitting at the desk writing code. Now, do I have to coordinate that schedule with my pairing partner? Should we hold hands and skip together? What about if I want to use the bathroom or grab a cup of tea? Should I be conscious as to ensure our bathroom/tea schedules are in synch? Are you kidding me? What is this, grade school all over again or a software factory (laughably some actually like that term).

Well here it goes. I’m a software artist/developer. I write code because I love to, not only because someone’s paying me big bucks to do it. I get aroused by solution revelations and enjoy compliments when I solve hard problems before others, or that others couldn’t, or in a more elegant/concise way than others. Software development is a creative process. It brings with it the ability to create a masterpiece. It’s also a process that requires critical thinking, reflection, and most of all “quite”. Private offices are great, but sometimes not realistic depending on the company’s financial situation. The office doesn’t have to be private, as long as what ever your arrangement is, it’s quite enough and not distracting for you to do your work without having to context switch every 10 minutes to hear some rant going on around you and/or answer questions. There are time for questions, but not when you’re in the zone.

I’d like to see Michelangelo or da Vinci create masterpieces under the constraints of a process. Better yet, I’d love to see those two pair paint.

We all don’t think or work the same. We’re humans. The key to good software is to hire great smart developers that can hold their own. Keep the teams small. The rest will be worked out as a team not as a process. It’s about common sense and good people, not processes.


21
Mar 10

New look and feel

So I finally bit the bullet and updated the look and feel of the blog. I must say, it was a big royal PITA. I use typepad, and although it served me rather well over the last few years, the template availability and the ability to easily utilize these custom templates is a big downside. Typepad makes available quite a few templates, but they all look like web -1.0 design. They do provide custom advanced templating, but that basically means you have to do it all yourself. You have to create and/or get a template and then customize it by augmenting it with typepad tags, etc… At the end of the day, you basically utilize their API to build a blog. I’m sure some folks don’t mind, but I really don’t want to waste days on doing this. Actually, I’d rather not spend more that 30 minutes on any of this. This is what took me so long to update. I also contemplated moving over to wordpress, which has thousands of attractive templates out of the box and available all over the web free and premium. The only thing that stopped me is the ability be able to import my current blog entries. My understanding is that it’s not that straightforward and rather time consuming, especially if you want to keep the same URLs.

So about 2 days later, I have something that looks quite a bit more decent than before.

Hopefully this will hold off until I can eventually have the time to move to wordpress.


19
Feb 10

Extension-based content negotiation and nested routes with Restlet

I’ve been working with Restlet to expose a RESTful api interface to the data model for one of my projects. Restlet is a super flexible library allowing one to configure and access all the properties of HTTP through a REST-oriented API.

The application bootstrapping options are also super flexible, allowing one to configure how routes are resolved and nest routes for cascading capabilities. I ran into a small caveat when I tried to configure extension based content negotiation. Basically, the idea of extension based content negotiation, is that instead of using “Accept” headers, one can append a mime extension to their request URI to request a particular return format. Say, we have a http://localhost/api/resource uri, one can request xml or json formats by simply doing http://localhost/api/resource.xml or http://localhost/api/resource.json. Of course your resource has to support these formats. The documentation on this type of content negotiation is non-existent. I had to scour a bunch of users group messages and javadocs before I figured it out. I figured I’ll shared if someone else is interested.

My applications is written in Scala, so examples will be provided as such. I’m sure any experienced developer can easily discern the java equivalent.

First, in your application bootstrapping, you must turn on the extensionsTunnel option. Here is my code, which also demonstrates nested routes. Then, in your resource you must conditionally infer the MediaType provided and emit the representation of this resource based on it.

import org.restlet.{Restlet, Application => RestletApplication}
import scala.xml._
//... other imports excluded

class TestApplication extends RestletApplication {
  override def createInboundRoot: Restlet = {

    val apiRouter = new Router(getContext)
    apiRouter.attach("/test", classOf[TestResource])

    val rootRouter = new Router(getContext)
    rootRouter.attach("/api/v1", apiRouter).getTemplate.setMatchingMode(Template.MODE_STARTS_WITH)

    getTunnelService.setExtensionsTunnel(true)

    return rootRouter
  }
}

class TestResource extends ServerResource {

  @Get("xml|json")
  def represent(v:Variant):String = {
    return v.getMediaType match {
          case MediaType.TEXT_XML | MediaType.APPLICATION_XML => <response><message>Hello from Restlet!</message></response>.toString
          case MediaType.APPLICATION_JSON => "{\"message\": \"Hello from Restlet\"}"
        }
  }
}

First, the root router’s matching mode must be set to Template.MODE_STARTS_WITH, otherwise it will try to match based on full absolute uri path and not find any nested resources. So the matching mode is very important in the case where you’re working with nested resources.

Second, you set the extensions tunnel property to true: getTunnelService.setExtensionsTunnel(true). This will turn on the extension tunneling service and perform content negotiation based on the URI’s extension. Note: if an extension is not provided, it will resort to first available representation supported by the resource. It can get more complicated I believe based on other configurations, but this is what happens in the most simple scenario.

Now, with content negotiation on, the resource has to conditionally infer the proper MediaType requested and provide its representation for the MediaType. In Scala this is very elegantly done using the super flexible match/case construct. This construct can be used as Java’s switch statement, but it is way more powerful and allows for advanced pattern matching. As you can see, I check for both xml and json media types and provide the proper representation. The supported media types are handled through @Get annotation. For more info, see Restlet’s annotations and Resource documentation.

Now, accessing the resources yields the following results:

  $ curl http://localhost:8080/api/v1/test.xml
  Hello from Restlet

  $ curl http://localhost:8080/api/v1/test.json
  {"message": "Hello from Restlet"}
Page 5 of 13« First...34567...10...Last »