05
Jan 09
Pragmatic Thinking and Learning
02
Dec 08
Choosing best/efficient algorithm implementation for the task
So as I'm progressing studying collective intelligence concepts and algorithms, I couldn't help but think about the most efficient/performant (no such word, but you know what I mean) ways of implementing statistical algorithms that perform calculations on very large data sets . As most programers proficient in a particular language(s) would do, is find a library or implement that algorithm in that particular technology, without looking outside of the box. I'd probably implement most of these in some rather efficient language, but wait…
#### Python with mean algorithm Time: 24.5236210823 seconds #####
import time
def mean (inlist):
sum = 0
for item in inlist:
sum = sum + item
return sum/float(len(inlist))start = time.time()
result = mean([i for i in range(1,50000000)])
end = time.time()
print "Result: %s, Start: %s, End: %s, Time elapsed: %s\n" % (result,
start, end, end – start)
#### Python with using the R-lang interface. It dispatches to R libs behind the scenes. Time: 14.780577898 seconds #####
import rpy2.robjects as robjects
import timer = robjects.r
start = time.time()
result = r.mean(robjects.FloatVector(range(1,50000000)))
end = time.time()
print "Result: %s, Start: %s, End: %s, Time elapsed: %s\n" %
(result[0], start, end, end – start)
#### R-lang using the mean function. Time: 0.654 seconds (Yes, that's 654 milliseconds!!!) #####
print(system.time(print(mean(array(1:50000000)))))
28
Nov 08
Reflection with generics and annotation introspection
So I ran into an issue today that cost me an hour scouring documentation on reflection to see how it behaves with generics only to not find any sensible info and rather find some blog posts from folks that experienced similar issues. The basic idea is that reflection and generics or rather reflection and runtime type binding doesn’t work as some might expect. Well, at this time of the night, code speaks better than words to me, so here is a small example…
Compiling and running above with assertions enabled will throw the java.lang.AssertionError. Why one might as, well that’s because although the reflection package seems to locate the method using supertype arguments in place of actual types, this must yield some different runtime binding of that method, since introspecting for an annotation of that method yields nothing.
I’m not sure what the intention of this is and whether it’s my lack of understanding the reflection APIs and how they introspect runtime types. I’ll do more digging over the weekend and report if I find something interesting. For now the only way this can be resolved unfortunately is by writing a dynamic search method that will find the right method based on its name and argument types. The types can be supertypes, as we dynamically compare the method argument types to the required types using Class#isAssignableFrom(Class) method. Here is a util method that takes care of locating the annotation…
23
Nov 08
AlexBuild refactoring
So we're going through some major refactoring for AlexBuild. We quickly realized that the original syntax didn't have enough provisions to make Alex extensible for 3rd party plugins as well as making it easier for us to add various lifecycle strategies.
Here is what we have so far…
################################################
define project as {
name: “some_widget”,
version: “0.01-alpha”
}
set property src to “java/src”
set property build_dir to “target/classes”
define dependency ivy://commons-logging.commons-logging version 1.1.1 as logging for all
define dependency commons-lang.commons-lang version LATEST as lang for compile, test, package
define dependency file:///home/user/dependencies/spring-2.5.5.jar as spring-full for compile
compile java from “${src}” to “${build_dir}”
compile groovy from “groovy/src” to “target/groovy_classes”
compile resources from “resources” to “target/classes”
create jar named “CoolWidget.jar”
from “${build_dir}” and “target/groovy_classes”
{
manifest: “path/to/MANIFEST”,
include: [ “**/*.class”, “**/*.xml”, “**/*.properties” ]
}
################################################
The basic idea is that each statement is implemented as an AlexPlug, which is an extensible set of interfaces that allow you to develop and extend how various statements in particular contexts are implemented. Each AlexPlug will consume a set of parameters called AlexParams and arguments called AlexArgs. The difference between AlexParams and AlexArgs is that AlexParams is a strongly typed set of parameters injected into the implementation of the plugin and AlexArgs is a JSON-like data structures which allows you to provide extensive loosely typed configuration for plugins. Take a look at the "create jar" statement above.
I'm hoping to have the grammar redefined tonight with the refactoring complete sometime this week. Once that's done, we can finally do our initial release. I say finally because before this refactoring exercise, the code in the trunk was ready for the first alpha release, though it didn't provide any points of extensibility that didn't require tinkering with the grammar. I believe this small delay is good and we can release the initial version with extensive extension APIs to allow for folks to write AlexPlugs.
More to come….
28
Sep 08
YourKit profiler
Profiling is probably one of the most joyous activity for me these days. Besides the fact that it feels good that you're actually got to the point that you have something to profile, you now get to see how the the software that you thought was so well designed is performing. You basically get to look in the guts of your software, forgetting all the abstractions that were provided to you and all the ones you've layered on top and actually see the how things collaborate to execute the instructions. The best part, you actually get to see what goes on in all the libraries that you thought were so great (or maybe not so great after your look under the hood). See this JIRA bug, which was a result of profiling.
Well, I've tried a few profilers for java and one that really stands out is YourKit. Although most profilers have pretty much the same capabilities, one thing that separates them is how intuitive and well polished the interface is. YourKit stands out in this category.
This weekend, I had to profile a few modules that weren't performing well. Although I have a free license of YourKit that we use on the Alex Build project, more on that later, I decided to download the EAP of version 8 trial to use it for a project not related to Alex Build. I'll most likely purchase a personal copy of YourKit once this trial runs out, as there are many projects I'd like to use YourKit for and for the very reasonable price of $499 you can't go wrong.
So I was profiling a TCP/IP application built on Apache MINA which had a thread model defined using JSE 5 executors. Outside of MINA, the application was scaling far beyond what it's throughput was using MINA. MINA being a very lightweight layer over Java's NIO should not bug down the performance and scalability this much. I saw a exponential performance decrease as throughput requirements increased. So I fired up YourKit and started pushing data into the application. At some point, I took a snapshot and then dove in.
Once you have a snapshot, you pretty much have all the details about the particular application runtime at the particular point in time you took it. I noticed that although the thread model was configured to have a fixed thread pool of 20 threads, only one worker thread was being used to process the payloads. This is very easy to see using the "Call stack (view by thread)" interface. This revealed a bigger problem with MINA 1.3, which basically serialized operations, although they claimed to support a thread model, what ever that means
. MINA 2.0 change that support to support true concurrent IO Handlers and though I migrated to the new API. The next profile revealed that all worked as expected and the thread pool was utilized as expected.
The callstack also revealed that 99% of the time was spent in IO, which is not something that we can tune any further at this point, therefore we're probably getting the best throughput we're going to get.
Netbeans has a very decent profiler as well, but fortunately I used IntelliJ and don't plan on switching to netbeans if I don't have to, even if it's just for profiling. YourKit provides plugins for most IDEs, including IntelliJ and Eclipse.
One other thing I wanted to point out, unlike other commercial profilers, YourKit offers free Open Source licenses to qualified projects. They were very quick and courteous in the process. Thanks YourKit, you make the profiling process very enjoyable.