MongoDB Aggregation

MongoDB has a aggregate operator that help programmers to define complex type of queries and combine operators such as group, match and find to build a complete result set.

Unfortunately, this operator doesn’t have enough tutorials and examples so I decided to explain it in better words.

 Aggregate is a pipeline

First of all, you should know that aggregate is a pipeline of different operators. So, what does it mean?

The aggregate operator accepts an array of stages:

db.collection.aggregate( [ { <stage> }, ... ] ) 

Each stage contains an operator such as $match, $limitor$project. After executing,aggregate` operator starts to perform each stage one by one. It’s important to know that each stage get the result set from the previous stage and the result set for the first stage is all records of the collection.

Following diagram illustrates a sample aggregate procedure (source: www.mongodb.org):

aggregation-pipeline.png

 Example

Let’s assume we have a users collection with following records:

[{ _id: '55329ec72d3c018764000001', name: 'Steve', role: 'admin', age: 35 }, { _id: '55329ec72d3c018764000002', name: 'Guillermo', role: 'admin', age: 28 }, { _id: '55329ec72d3c018764000003', name: 'Roshan', role: 'user', age: 45 }] 

 Example with two stages

Following aggregate operator has two stages, $sort and $limit:

User.aggregate([{ $sort: { age: -1 } }, { $limit: 2 }], function (err, records) { //... }); 

First of all, the aggregate operator executes the first stage. The result set for the first stage is the whole collection records. $sort operator sorts the users collection by age field. The result of the first stage is:

[{ _id: '55329ec72d3c018764000003', name: 'Roshan', role: 'user', age: 45 }, { _id: '55329ec72d3c018764000001', name: 'Steve', role: 'admin', age: 35 }, { _id: '55329ec72d3c018764000002', name: 'Guillermo', role: 'admin', age: 28 }] 

Then, the second stage gets the sorted array of users from previous stage and eliminates a record. Finally, the result of aggregate operator will be:

[{ _id: '55329ec72d3c018764000003', name: 'Roshan', role: 'user', age: 45 }, { _id: '55329ec72d3c018764000001', name: 'Steve', role: 'admin', age: 35 }] 

You can find more operators for aggregation stages here: http://docs.mongodb.org/manual/reference/operator/aggregation/

 Conclusion

Aggregate operator is a pipeline of stages. It accepts an array of stages and each stage has a operator. The first stage gets all records of the collection and after performing the operator, passes the result to the next stage.

Source: New feed

Scheduled backup for MongoDB

MongoDB is one of NoSQL pioneers and it has a great community. Nowadays, most of startups prefer to use MongoDB as the main database because of its simplicity. Configuring scheduled backup for a database is really important to keep the last updated data somewhere and restore it in case of database crash.

In this post I want to introduce a simple open-source tool that I’ve recently published to setup a minimal scheduled backup for MongoDB.

 Dependencies

I built this tool using Node.js so first of all you need to install http://nodejs.org/. I will publish a binary version for all platforms soon.

You need no more dependencies but official Node.js module. In the next step we will install them using npm.

 Install

I named this tool mongodb-backup and you can clone the repository from Github:

https://github.com/afshinm/mongodb-backup

Then go to the folder of project and run following command:

npm install

This command installs all required dependencies including AWS sdk.

 Configuration

This tool has a simple config.js file that defines all dependencies to run the mongodump command and upload it to the Amazon S3 storages. Here you can find a list of all dependencies: https://github.com/afshinm/mongodb-backup#configjs

 Go go go

Final step is to run the script. You can run the cronjob using following command:

node index.js start

And if you want to run the script as a deamon, install forever using npm install forever and then run this command:

forever start index.js start

 Conclusion

This version of mongodb-backup works with Amazon S3 but I will publish another version soon that accepts FTP and SFTP to transfer backup files.

Source: New feed

MongoDB singleton connection in NodeJs

In this post, I want to share a piece of useful source code to make a singleton connection in NodeJs. By using this piece of code, you will have always one connection in your NodeJs application, so it will be more faster. Also, if you are using NodeJs frameworks like ExpressJs, it will be useful too.

connection.js:

var Db = require('mongodb').Db; var Connection = require('mongodb').Connection; var Server = require('mongodb').Server; //the MongoDB connection var connectionInstance; module.exports = function(callback) { //if already we have a connection, don't connect to database again if (connectionInstance) { callback(connectionInstance); return; } var db = new Db('your-db', new Server("127.0.0.1", Connection.DEFAULT_PORT, { auto_reconnect: true })); db.open(function(error, databaseConnection) { if (error) throw new Error(error); connectionInstance = databaseConnection; callback(databaseConnection); }); }; 

And simply you can use it anywhere like this:

var mongoDbConnection = require('./lib/connection.js'); exports.index = function(req, res, next) { mongoDbConnection(function(databaseConnection) { databaseConnection.collection('collectionName', function(error, collection) { collection.find().toArray(function(error, results) { //blah blah }); }); }); }; 

Now, you will have only one connection in your NodeJs application.

Download codes from Gist:

  • connection.js
  • example-connection.js

Let me know what you think 🙂

Source: New feed

Simplified MapReduce

I believe one of the best solutions to solve a programming problem is to find a paper or article and read it as a clue. Well, of course Wikipedia, BMI and other sources are really helpful but somehow reading them is a nightmare for me, because of complexity of explanation. Thus, It’s preferable for me to read a straightforward article to find the clue.

You have a problem you find a paper about it. Now you have one and a half problems. Understanding the paper, and implementing it.

— Amir Mohammad Saied (@gluegadget) February 6, 2014

And now, I want to straightforwardly describe one of usable algorithms, its called MapReduce. Perhaps you have heard that before in Hadoop, MongoDB or NoSQL discussions.

Here is an introduction to what is a MapReduce is:

MapReduce is a programming model for processing large data sets with a parallel, distributed algorithm on a cluster.

From: http://en.wikipedia.org/wiki/MapReduce

Mainly, a MapReduce used to gather information from a massive datasets, faster and easier. The algorithm consists of two main functions, map and reduce. The map function is used to collecting data from the inputs. At this step, map function breaks the input into smaller chunks. In reduce function; we will put or aggregate the map function’s results together, to make a single result.

The reduce function will always performed after the map function.

To understand the process better, I’d like to give an example. Suppose we have a news website, each news is an entity in our database. Each news item has an Array of keywords that describes the news. Following is a sample of a news item:

{ title: ‘Hello world!’, description: ‘Hello world! This is the first post from our awesome news portal; we will publish more news here. Thanks.’ keywords: [{ word: ‘hello’, count: 1 }, { word: ‘world’, count: 1 }, { word: ‘news’, count: 2 }, { word: ‘post’, count: 1 }] } 

So, what we want to do? We have a lot of news items, and an Array of keywords inside each one. We are going to determine popular keywords from all news items.

First of all, the map function will break the news item into smaller pieces. Actually, we should emit the keyword and the number of repeat inside the map function. The emit function is used to push new values into a temporary key-value pair, this array will be used in reduce function further to generate a single value.

Following is an example of map function source code:

function () { this.keywords.forEach(function (doc) { emit(doc.word, doc.count); }) } 

To understand the map function better, following is an output of this function. When we have “hello” word that repeated twice with out number of 1 and 3, the output will be:

{ “hello”: [1, 3] } 

And when we have the word “post” that repeated once, with count number of 2, the output would be:

{ “post”: [2] } 

Then, we have the reduce function. Inside the reduce function we will wrap up map function’s result to create a single value. The single value is a keyword with total count of repetition in all news items.

Following is the reduce function source code:

function (key, values) { return Array.sum(values); }; 

So, following is the output of reduce function:

{ id: “hello”, value: 4 } 

And for the second map function’s output, the result will be:

{ id: “post”, value: 2 } 

After performing the reduce function, we will have a set of keywords with the total count of repetition amongst all news items, that is, the array of popular keywords.

Of course the above explanation was a briefly look into the MapReduce algorithm. There are a lot of MapReduce frameworks and you can find them in NoSQL databases, MongoDB for instance.

Source: New feed