Python: List and Tuple performance benchmark

Suppose you have two options to implement a solution using a programming language, what are important factors to select one of options? I do believe one of concerns for a programmer would be performance benchmark between those options.

In this short blog post I’d like to share my simple code and results for performance benchmark between Python list and tuple. Two features to create a list, but with this difference, that tuples are immutable and you can’t alter them after initializing.

Following code shows a simple usage of list and tuple to create a series of items:

# this is a list, you can alter it in next lines
l = [1, 2, 3, 4, 5]

# this is a tuple and it's immutable
t = (1, 2, 3, 4, 5) 

Please note that you can store different data types as an item for both tuple and list.

My scenario to make a performance benchmark between list and tuple is retrieving 2,000,000 random items from a list (or tuple) with 1,000,000 items.

Here is the source code for list:

import time
from random import randint


x = 1000000

demo_list = []

# add items to list
while x > 0:
    demo_list.append(x)
    x = x - 1

start = time.clock()

# find random items from list
y = 2000000
while y > 0:
    item = demo_list[randint(0, 999999)]
    y = y - 1

# print the elapsed time
print (time.clock() - start)

Following chart illustrates the performance benchmark between list and tuple on a Mac OS X 10.9.3 and Python 2.7.5:

Benchmark between list and tuple

Elapsed times:

  • Tuple: 5.1217s
  • List: 5.2462s

And it seems tuples are a little bit faster in retrieving items.

You can download the source code for both list and tuple from my Github account:
https://github.com/afshinm/python-list-vs-tuple-benchmark

Source: New feed

Using Redis as session store for ExpressJS + PassportJS settings

Since all other articles about using Redis as a session store for ExpressJS are out of date due to latest update of connect-redis, I’m going to write this short article to show how to use Redis as session store for ExpressJS.

In order to use Redis as session store for ExpressJS, first of all install connect-redis package using npm:

npm install connect-redis

Then, add connect-redis to your dependencies and Following code shows the content of app.js file:

var express = require('express');
var RedisStore = require('connect-redis')(express.session);

app.use(express.session({ store: new RedisStore({
  host: '127.0.0.1',
  port: 6379
}), secret: 'hey you' }));

If you’re using PassportJS for users authentication, you can simply add two following lines to your app.js file to enable PassportJS as well:

app.use(passport.initialize());
app.use(passport.session());

That’s it. Now you have PassportJS for authentication and Redis as session store for ExpressJS.

Furthermore you can use following code to delete the Redis session:

exports.logout = function (req, res) {
  req.session.destroy();
  res.redirect('/');
};

Source: New feed

Why C# is not a good choice for web development?

C# is the main programming language in Iran. I’ve worked with several teams, various projects and developers with different development skills. Earlier I’ve worked with PHP.

There weren’t any motivations to migrate from PHP to C# but the company’s infrastructure. Most of web development companies in Iran work with C# and Microsoft technologies, and here is the reason, because they don’t want to learn more. Companies prefer to stay at the same level and do not pay for improving their developers skills!

If you ask them: “Why you are using C# for web development?” I bet they won’t give you acceptable answer.

I haven’t used C# for my own projects, and I won’t at all. It’s preferable to me to use NodeJS or Python, not only because of their popularity, but because they are scripting languages.

After spending ages with C#-based web apps I want to tell something horrible about this language and why it’s not an appropriate choice for web development.

What I explain here is the issue that I faced with, several times. Before explaining the problem let me tell you something about C# compiler and how it works.

Suppose you have a Service Layer project, like following:

  • UserService.cs
  • GroupService.cs
  • NewsService.cs

All of above files are located in a same project (.csproj file). You can use this project as a dependency for other projects, for instance, the NewsService class to fetch news from database. If you compile it, compiler gives a .dll file. This file is used as a dependency file for other projects, meaning, in order to change only one method in NewsService class, you should replace whole dll file, not only one file.

And yes, I know, it’s not the problem of C#.

Ok, here is the difficulty. We have a C#-based web app in our company, we use MVC for it’s presentation. There are a lot of instances of this app that deployed in different machines. Suddenly, we realized that there is a performance issue in our app. One of our customers reported it to us. The remedy was change the logic of a method in the Service Layer.

The change is easy, change the method, build it and replace new dll with old one. But replacing the new dll file will change other class and methods signature and logic, too.

I realized that the current version of deployed application is a little bit older than last stable version. So, if I decided to replace the fixed Service Layer with the current one, application will break because of the change of other methods.

In this situation you can ask me why you don’t have any versioning system, this kind of issues can be solved using Semver or something. My answer is: we were wrong at the first days of structuring our team, so we missed this part.

But what if we used Python instead of C#? The key was as easy as fixing that method and upload only one file, without touching other files. Additionally, changing a C#-based web app version is a nightmare for me. Conflicts between dll files, their versions, etc.

Despite all great features and nice parts of C#, in my opinion above reasons make it a bad choice for web development.

At the end of nagging I want to show you something:

ASP.NET MVC

Is it cool to use a design pattern name as a name for framework?

Source: New feed

Become a CreateJS Ninja with "Getting Started with CreateJS" book

Nowadays using HTML5 technologies to develop web pages are become more commonplace and consequently HTML5-based libraries are become more popular. CreateJS is one of popular and successful libraries for creating rich HTML5-based web pages.

 What is CreateJS?

A suite of Javascript libraries & tools for building rich, interactive experiences with HTML5.

CreateJS consist of four different libraries and each one is responsible for specific part of HTML5 technology.

  • EaselJS
  • TweenJS
  • PreloadJS
  • SoundJS

With combination of all parts you can develop a rich web application easily.

 Getting Started with CreateJS

If you want to start learning CreateJS as fast as possible, “Getting Started with CreateJS” book with Packt Publishing is a good starting point. It’s a step-by-step tutorial for all CreateJS subsets. The book consists of many practical examples that help you to learn faster.

Getting Started with CreateJS - Cover.jpg

If you want to start learning CreateJS now, this book is a right choice for you.

Read more about the book here, in Packt Publishing website.

Source: New feed

MongoDB singleton connection in NodeJs

In this post, I want to share a piece of useful source code to make a singleton connection in NodeJs. By using this piece of code, you will have always one connection in your NodeJs application, so it will be more faster. Also, if you are using NodeJs frameworks like ExpressJs, it will be useful too.

connection.js:

var Db = require('mongodb').Db;
var Connection = require('mongodb').Connection;
var Server = require('mongodb').Server;
//the MongoDB connection
var connectionInstance;

module.exports = function(callback) {
  //if already we have a connection, don't connect to database again
  if (connectionInstance) {
    callback(connectionInstance);
    return;
  }

  var db = new Db('your-db', new Server("127.0.0.1", Connection.DEFAULT_PORT, { auto_reconnect: true }));
  db.open(function(error, databaseConnection) {
    if (error) throw new Error(error);
    connectionInstance = databaseConnection;
    callback(databaseConnection);
  });
};

And simply you can use it anywhere like this:

var mongoDbConnection = require('./lib/connection.js');

exports.index = function(req, res, next) {
  mongoDbConnection(function(databaseConnection) {
    databaseConnection.collection('collectionName', function(error, collection) {
      collection.find().toArray(function(error, results) {
        //blah blah
      });
    });
  });
};

Now, you will have only one connection in your NodeJs application.

Download codes from Gist:

Let me know what you think 🙂

Source: New feed

Simplified MapReduce

I believe one of the best solutions to solve a programming problem is to find a paper or article and read it as a clue. Well, of course Wikipedia, BMI and other sources are really helpful but somehow reading them is a nightmare for me, because of complexity of explanation. Thus, It’s preferable for me to read a straightforward article to find the clue.

You have a problem you find a paper about it. Now you have one and a half problems. Understanding the paper, and implementing it.

— Amir Mohammad Saied (@gluegadget) February 6, 2014

And now, I want to straightforwardly describe one of usable algorithms, its called MapReduce. Perhaps you have heard that before in Hadoop, MongoDB or NoSQL discussions.

Here is an introduction to what is a MapReduce is:

MapReduce is a programming model for processing large data sets with a parallel, distributed algorithm on a cluster.

From: http://en.wikipedia.org/wiki/MapReduce

Mainly, a MapReduce used to gather information from a massive datasets, faster and easier. The algorithm consists of two main functions, map and reduce. The map function is used to collecting data from the inputs. At this step, map function breaks the input into smaller chunks. In reduce function; we will put or aggregate the map function’s results together, to make a single result.

The reduce function will always performed after the map function.

To understand the process better, I’d like to give an example. Suppose we have a news website, each news is an entity in our database. Each news item has an Array of keywords that describes the news. Following is a sample of a news item:

{
  title: ‘Hello world!’,
  description: ‘Hello world! This is the first post from our awesome news portal; we will publish more news here. Thanks.’
  keywords: [{
    word: ‘hello’,
    count: 1
  }, {
    word: ‘world’,
    count: 1
  }, {
    word: ‘news’,
    count: 2
  }, {
    word: ‘post’,
    count: 1
  }]
}

So, what we want to do? We have a lot of news items, and an Array of keywords inside each one. We are going to determine popular keywords from all news items.

First of all, the map function will break the news item into smaller pieces. Actually, we should emit the keyword and the number of repeat inside the map function. The emit function is used to push new values into a temporary key-value pair, this array will be used in reduce function further to generate a single value.

Following is an example of map function source code:

function () {
  this.keywords.forEach(function (doc) {
    emit(doc.word, doc.count);
  })
}

To understand the map function better, following is an output of this function. When we have “hello” word that repeated twice with out number of 1 and 3, the output will be:

{ “hello”: [1, 3] }

And when we have the word “post” that repeated once, with count number of 2, the output would be:

{ “post”: [2] }

Then, we have the reduce function. Inside the reduce function we will wrap up map function’s result to create a single value. The single value is a keyword with total count of repetition in all news items.

Following is the reduce function source code:

function (key, values) {
  return Array.sum(values);
};

So, following is the output of reduce function:

{ id: “hello”, value: 4 }

And for the second map function’s output, the result will be:

{ id: “post”, value: 2 }

After performing the reduce function, we will have a set of keywords with the total count of repetition amongst all news items, that is, the array of popular keywords.

Of course the above explanation was a briefly look into the MapReduce algorithm. There are a lot of MapReduce frameworks and you can find them in NoSQL databases, MongoDB for instance.

Source: New feed

Programming languages war

So frequently I hear a tedious conversation between my colleagues, “PHP is better than foo”, “.NET is better than boo” etc. etc. Each time I hear this sort of dialog I try to ask the reasons of this comparison but till now, no one has given a proper answer since somehow it’s impossible.

Up till now, I’ve coded with JavaScript, PHP, C# and little bit Python. I’m still a newbie in programming industry, but at least I had a lot of mutual projects with some experts. The result of being a part of those projects, taught me how to tackle a problem, how to choose an adequate tools or language and prepare the environment to solve it. Almost in all cases, indicating the programming language wasn’t the bottleneck. However, we did that with considering the problem’s parameters. Design a good architecture and implementing it correctly was the main goal in our projects.

I’d like to point it that obviously, the programming environment is not the only parameter to have a robust application. The most major parameter to have a good result, is the knowledge of programmers, not the facilities of programming languages. I don’t use more than 50% of programming languages features in the progress of development, and I bet no one else will either.

As time passed by, old-fashioned programming languages get retired and new technologies come into the battlefield. Consequently, it’s not a value to know a programming language well, it would be better if you know the concepts.

DISCLAIMER: Above writing is my thoughts and obviously you don’t have to be agreed with necessarily.

Source: New feed

Async vs. Sync I/O benchmark in NodeJs

As you know, NodeJs is a non-blocking I/O platform which gives you ability to do non-blocking and event-based functionalities. It has async methods for I/O but it also provide sync version of that methods as well. It means you can write to a file with async/non-blocking methods and you can do the same with sync methods.

So, in this post I want to show you the different between using async or non-blocking I/O and sync I/O. Here I have a HTTP server which has a simple functionality, it just reads a static file from disk and gives the content of file to the user by an HTTP request. There’s two different ways for reading a file from disk in NodeJs, with fs.open (async) or fs.openSync (sync).

Results speak for themselves, as expected. When we try to read a file with Async mode, all steps of reading a file (stat, open, read, close) are async and it means the reading process will not block the request (less request time) but in Sync mode, each step should wait for previous step result so it takes more than Async mode.

I used Apache Benchmark (ab) for these tests, with this parameters:

ab -n 1000 -c 1000 -vhr http://localhost:8081/

And the test system is:

CentOs, Linux 2.6.18-164.el5 and NodeJs v0.8.8, 512MB Memory, QEMU Virtual CPU.

Well, let’s see the results.

Async mode:

Time taken for tests: 3.800 seconds
Requests per second: 263.19 [#/sec] (mean)
Time per request: 3799.512 [ms] (mean)
Time per request: 3.800 [ms] (mean, across all concurrent requests)

Percentage of the requests served within a certain time (ms)
50% 2667
66% 2682
75% 3752
80% 3752
90% 3761
95% 3765
98% 3765
99% 3765
100% 3765 (longest request)

Sync mode:

Time taken for tests: 4.809 seconds
Requests per second: 207.95 [#/sec] (mean)
Time per request: 4808.944 [ms] (mean)
Time per request: 4.809 [ms] (mean, across all concurrent requests)

Percentage of the requests served within a certain time (ms)
50% 2418
66% 3152
75% 3585
80% 3827
90% 4320
95% 4551
98% 4712
99% 4760
100% 4809 (longest request)

You can see that in the Async mode you can process about 264 requests per second while in Sync mode it’s about 208.

I made this test to show the power of Async I/O functions in NodeJs and also to show the NodeJs developers that using Sync I/O functions is not a good solution for Callback Hell, there are several better approaches to solve the Callback Hell problem, keep using Async functions.

You can download and run this test yourself, I made a Github repo, here you can download them: https://github.com/afshinm/Async-Sync-IO-benchmark

Source: New feed

Migrating Git Repositories

Moving a Git repository from a server to another is a common situation that all developers could face with it, for example moving repositories from Bitbucket to Github or vice versa.

Doing this task is really terrible when you have to move a lot of repositories, you should manually clone it and then push it to the target server. Boring.

I wrote a shell script which help you out to move the repositories from any Git server to another, you can simply configure it and then just hit the Enter.

 Configuration

Ok, to get started you should clone or download this repository from Github: https://github.com/afshinm/git-migrate

After that, you should find two files, migrate.sh and CONFIG. We need CONFIG file to configure our migration, it contains from and to servers.

Our CONFIG file is look like this:

repoName1:fromServer1:toServer1
repoName2:fromServer2:toServer2
repoName3:fromServer3:toServer3

Each repository should be in one line, and in each line we have three variables separated by : character. Repository name, from Git server and to server. You can choose anything for name part of the config, it’s not matter.

Here you can see an example of the CONFIG file:

test:git@bitbucket.org:afshinm/test.git:git@github.com:afshinm/test.git

In above example I moved test repository from Bitbucket to Github. Please note that if in any variables you have : character, you should put a backslash before it to prevent the conflict between variables and values.

Also you can use both HTTPS and SSH urls for from or to servers but I prefer to use SSH forms (then you need to create a SSH Key and add it to both from and to servers, see this article)

 Executing

After saving the CONFIG file, everything is ready for migration. Just type below command in shell and press Enter:

./migrate.sh

Then you will see a log of migration in your shell environment. Also you will notice if there are any errors in migration.

Source: New feed