So as I'm progressing studying collective intelligence concepts and algorithms, I couldn't help but think about the most efficient/performant (no such word, but you know what I mean) ways of implementing statistical algorithms that perform calculations on very large data sets . As most programers proficient in a particular language(s) would do, is find a library or implement that algorithm in that particular technology, without looking outside of the box. I'd probably implement most of these in some rather efficient language, but wait…

#### Python with mean algorithm Time: 24.5236210823 seconds #####

import time

def mean (inlist):

sum = 0

for item in inlist:

sum = sum + item

return sum/float(len(inlist))start = time.time()

result = mean([i for i in range(1,50000000)])

end = time.time()

print "Result: %s, Start: %s, End: %s, Time elapsed: %s\n" % (result,

start, end, end – start)

#### Python with using the R-lang interface. It dispatches to R libs behind the scenes. Time: 14.780577898 seconds #####

import rpy2.robjects as robjects

import timer = robjects.r

start = time.time()

result = r.mean(robjects.FloatVector(range(1,50000000)))

end = time.time()

print "Result: %s, Start: %s, End: %s, Time elapsed: %s\n" %

(result[0], start, end, end – start)

#### R-lang using the mean function. Time: 0.654 seconds (Yes, that's 654 milliseconds!!!) #####

print(system.time(print(mean(array(1:50000000)))))