coding, ruby

Map Reduce plus filter using Ruby

Previous two articles were dedicated to JavaScript and map reduce and filtering of somewhat large data, elegance of code, etc.

Going forward, i’d like to evaluate other languages doing the same exact thing.

And i’m curious about performance too.

 

Here is my version of the same code i wrote but in Ruby (Ruby 2.1.1 was used). Didn’t even run in Jruby 1.7 unfortunately.

 

 class Time
 	def to_ms
 		(self.to_f * 1000.0).to_i
 	end
 end

 total = 300 *10000
 data = Array.new

 total.times.each {|x|
 	data.push({name: 'it', salary: 33*x})
 }

 data.push({name: "it", salary: 100})
 data.push({name: "acc", salary: 100})

 def self.timeMe
 	start = Time.now.to_ms
 	yield
 	endtime = Time.now.to_ms
 	puts "Time elapsed #{endtime - start} ms"
 end

 timeMe do
 	boom = data.map {|j|  j[:salary] if j[:name] =='it' }.compact.reduce(:+)
 	puts "and  boom: #{boom} "
 end

Result

 

and  boom: 148499950500100
Time elapsed 1194 ms

Not bad but not as fast as javascript, probably due to array containing Hash, as opposed to one of the core types in javascript (a simple object) .

I’m not entirely sure why Jruby didn’t run, id love to learn.

Update (2014/april 18): Java 1.6 with JVM parameters (needed) posted results of around 600ms , JVM 1.8 didn’t seem to work at the moment.

On another note, Ruby’s syntax is simply lovely

 

data.map {|j|  j[:salary] if j[:name] =='it' }.compact.reduce(:+)

 

Share This:

coding, javascript, nodejs

Async map reduce filter using NodeJS and callbacks in parallel

Following up with a series i started earlier

http://jeveloper.com/map-reduce-is-fun-and-practical-in-javascript/

Writing clean code is indeed paramount in our industry and we all aspire to be better at it. With popularization of NodeJS we face another challenge

Our first challenge was to process large set of json objects , filter it by name property and get a total for that group.

This is a traditional JavaScript blocking way of doing it.

var data = []

while( data.length < 100) {
   data.push({name: "it", salary: 33*data.length});
}
data.push({name: "accounting", salary: 100});

data.push({name: "acc", salary: 100});
var sum = data.filter(function(val){
	return val.name == "it"
})
.map(function(curr){
	return curr.salary;
})
.reduce(function(prev, curr){
	return prev +curr;
})

console.log(sum);

I thought, well, this can be done in an asynchronous way. I’ve had a great production use of ‘async’ library that works mainly on NodeJS but also in browser.

To ramp up the numbers, we’ll create 3000000 objects.

> Finished iterating , took: 656 Sum 148499950500100

It took 656 ms. That’s pretty quick.

Here is my implementation using Async. Few comments:

Control is passed using callbacks. Iterators in most cases include an object and a callback. Filter is a special case that does not have a typical nodeJS  (err, data) pattern.

async.filter(data, function(item,cb){
	item.name == "it" ? cb(true) : cb(false);
}, function(results){
async.map(results,function(item,cb){
	return cb(null,item.salary);
}, function(err,results2){

async.reduce(results2,0, 

function(memo, item, cb2){
//functions in a series
		setImmediate(function (){
			cb2(null,memo+item); 
		});

},function(err, sum){
		end = +new Date();
      var diff = end - start; // time difference in milliseconds
      console.log(" Finished iterating , took: "+diff + " Sum "+sum);

});

});
});

Pretty cool but the numbers… not so good 9.8 seconds, JEEZ

 Finished iterating , took: 9835 Sum 148499950500100

Here is a series problem: reduce is executed in series, meaning it is sequential in terms of getting the final result, that’s a performance bottleneck.

Don’t be alarmed, there is a way and i absolutely tested it.

async.each(data, function(item,cb){
	if (item.name == "it")
		sum += item.salary;
	cb();

}, function(err){
	end = +new Date();
      var diff = end - start; // time difference in milliseconds
      console.log(" Finished iterating , took: "+diff + " Sum "+sum);
  });

Async’s each is the most commonly used method for executing in parallel.

Result:

Finished iterating , took: 446 Sum 148499950500100

 Much faster!

Async provides a lot of useful methods, one really useful is Sort/Sort By, eachSeries (will execute in sequence) and most important method is Async.parallel([methods to be executed in paralel], callback)

 

Voila & Thanks

 

 

Share This: