Following up with a series i started earlier

http://jeveloper.com/map-reduce-is-fun-and-practical-in-javascript/

Writing clean code is indeed paramount in our industry and we all aspire to be better at it. With popularization of NodeJS we face another challenge

Our first challenge was to process large set of json objects , filter it by name property and get a total for that group.

This is a traditional JavaScript blocking way of doing it.

var data = []

while( data.length < 100) {
   data.push({name: "it", salary: 33*data.length});
}
data.push({name: "accounting", salary: 100});

data.push({name: "acc", salary: 100});
var sum = data.filter(function(val){
	return val.name == "it"
})
.map(function(curr){
	return curr.salary;
})
.reduce(function(prev, curr){
	return prev +curr;
})

console.log(sum);

I thought, well, this can be done in an asynchronous way. I’ve had a great production use of ‘async’ library that works mainly on NodeJS but also in browser.

To ramp up the numbers, we’ll create 3000000 objects.

> Finished iterating , took: 656 Sum 148499950500100

It took 656 ms. That’s pretty quick.

Here is my implementation using Async. Few comments:

Control is passed using callbacks. Iterators in most cases include an object and a callback. Filter is a special case that does not have a typical nodeJS  (err, data) pattern.

async.filter(data, function(item,cb){
	item.name == "it" ? cb(true) : cb(false);
}, function(results){
async.map(results,function(item,cb){
	return cb(null,item.salary);
}, function(err,results2){

async.reduce(results2,0, 

function(memo, item, cb2){
//functions in a series
		setImmediate(function (){
			cb2(null,memo+item); 
		});

},function(err, sum){
		end = +new Date();
      var diff = end - start; // time difference in milliseconds
      console.log(" Finished iterating , took: "+diff + " Sum "+sum);

});

});
});

Pretty cool but the numbers… not so good 9.8 seconds, JEEZ

 Finished iterating , took: 9835 Sum 148499950500100

Here is a series problem: reduce is executed in series, meaning it is sequential in terms of getting the final result, that’s a performance bottleneck.

Don’t be alarmed, there is a way and i absolutely tested it.

async.each(data, function(item,cb){
	if (item.name == "it")
		sum += item.salary;
	cb();

}, function(err){
	end = +new Date();
      var diff = end - start; // time difference in milliseconds
      console.log(" Finished iterating , took: "+diff + " Sum "+sum);
  });

Async’s each is the most commonly used method for executing in parallel.

Result:

Finished iterating , took: 446 Sum 148499950500100

 Much faster!

Async provides a lot of useful methods, one really useful is Sort/Sort By, eachSeries (will execute in sequence) and most important method is Async.parallel([methods to be executed in paralel], callback)

 

Voila & Thanks

 

 

Share This:

Leave a Reply