Monday, September 8, 2014

Database multi-tenancy using Hibernate 4

With many business applications being written, with intent to be deployed on cloud, there is increasingly a need to design multi-tenancy into your application.

For persistence layer multi-tenant design offers a few options:

  1. separate database per tenant
  2. separate schema, same database per tenant
  3. same schema, same database for all tenant with discriminator column for tenant rows in same table

In this article, I am going to talk more about implementing the options 1 and 2 using the support in hibernate 4, for multi-tenancy.

Also for the database connections, we have choice of having a common database connection pool, from which connections are allocated to each tenant request, or separate connection pools for each tenant.
For the common connection pool approach, each connection before being used for data access, needs to be "primed" for usage against a tenant database or schema, using statements like "use <tenant_schema>" or "use <tenant_database>"

Multi-tenancy in Hibernate

For the "separate connection pool per tenant approach"

First we need to implement a custom connection provider as follows

import java.sql.Connection;
import java.sql.SQLException;

import org.apache.commons.dbcp2.BasicDataSource;
import org.hibernate.engine.jdbc.connections.spi.ConnectionProvider;

public class ConnectionProviderImpl implements ConnectionProvider {
 private final BasicDataSource basicDataSource = new BasicDataSource();
 public ConnectionProviderImpl(String database){
                //this should be read from properties file

 public boolean isUnwrappableAs(Class arg0) {
  return false;

 public  T unwrap(Class arg0) {
  return null;

 public void closeConnection(Connection arg0) throws SQLException {

 public Connection getConnection() throws SQLException {
  return basicDataSource.getConnection();

 public boolean supportsAggressiveRelease() {
  return false;


Next we need to implement AbstractMultiTenantConnectionProvider of hibernate, as follows
Here I am maintaining a map of database identifiers against, connection providers.
When hibernate invokes the selectConnectionProvider( ) method with the tenant identifier, we use the tenant identifier to return the "appropriate" connection provider from the map.
It is necessary to implement a method like getAnyConnectionProvider( ), which should return, a sensible - default connection provider.

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;

import org.hibernate.engine.jdbc.connections.spi.AbstractMultiTenantConnectionProvider;
import org.hibernate.engine.jdbc.connections.spi.ConnectionProvider;

public class MultiTenantConnectionProvider extends AbstractMultiTenantConnectionProvider {
 private HashMap connProviderMap = new HashMap();
 public MultiTenantConnectionProvider(){

  List providerNames = new ArrayList();
  //need to get above from properties file
    for (String providerName : providerNames) {
       connProviderMap.put(providerName, new ConnectionProviderImpl(providerName));
 protected ConnectionProvider getAnyConnectionProvider() {
  System.out.println("inside MultiTenantConnectionProvider::getAnyConnectionProvider");
  return connProviderMap.get("default_db");

 protected ConnectionProvider selectConnectionProvider(String tenantId) {
  ConnectionProvider connectionProvider = connProviderMap.get(tenantId);
  if(connectionProvider == null)
   connectionProvider =  new ConnectionProviderImpl("default_db");
  return connectionProvider;


For common connection pool for all tenants.

We need to implement MultiTenantConnectionProvider of hibernate, as follows
Since we are using the same connection pool, in getConnection(), we need to prime the connection using a SQL statement like 'use <database>', so that the connection's further use will be in database as per tenant_id.
Also in releaseConnection( ), we have a 'use <default_db>' as a fallback.

import java.sql.Connection;
import java.sql.SQLException;

import org.hibernate.HibernateException;
import org.hibernate.engine.jdbc.connections.spi.ConnectionProvider;
import org.hibernate.engine.jdbc.connections.spi.MultiTenantConnectionProvider;

public class MultiTenantConnectionProviderWithSingleDBPool implements
  MultiTenantConnectionProvider {
 private final ConnectionProvider connectionProvider = new ConnectionProviderImpl(CurrentTenantIdentifierResolver.DEFAULT_TENANT_ID); 

 public boolean isUnwrappableAs(Class arg0) {
  return false;

 public  T unwrap(Class arg0) {
  return null;

 public Connection getAnyConnection() throws SQLException {
  System.out.println("inside MultiTenantConnectionProvider::getAnyConnection");
  return connectionProvider.getConnection();

 public void releaseAnyConnection(Connection connection) throws SQLException {
  connectionProvider.closeConnection( connection );

 public Connection getConnection(String tenantIdentifier) throws SQLException {
  final Connection connection = getAnyConnection();
  try {
     connection.createStatement().execute( "USE " + tenantIdentifier );
  catch ( SQLException e ) {
     throw new HibernateException(
       "MultiTenantConnectionProvider::Could not alter JDBC connection to specified schema [" +tenantIdentifier + "]",e);
  return connection;

 public void releaseConnection(String tenantIdentifier, Connection connection) throws SQLException {
  try {
   connection.createStatement().execute( "USE default_db" );
  catch ( SQLException e ) {
     throw new HibernateException(
     "Could not alter JDBC connection to specified schema [" +
   tenantIdentifier + "]",e);
  connectionProvider.closeConnection( connection );

 public boolean supportsAggressiveRelease() {
  return false;


refer hibernate docs

Now, to persuade hibernate to use tenant identifier before any database access, we can
  1. manually set the tenant identifier on hibernate session factory
  2. we can implement hibernate tenant identifier resolver

Manually set the tenant identifier on hibernate session factory
This can be done, in a web application in any suitable interceptor where hibernate session is available
Session session = sessionFactory.withOptions()
        .tenantIdentifier( yourTenantIdentifier )

Hibernate tenant identifier resolver can be implemented as follows.
Here I am using a ThreadLocal to get the tenant identifier. The threadlocal itself can be set early on in the http request thread in say, a servlet filter.

import org.hibernate.context.spi.CurrentTenantIdentifierResolver;

public class CurrentTenantIdentifierResolver implements
  CurrentTenantIdentifierResolver {
 public static ThreadLocal _tenantIdentifier = new ThreadLocal();
 public static String DEFAULT_TENANT_ID = "default_db";

 public String resolveCurrentTenantIdentifier() {
  System.out.println("from inside resolveCurrentTenantIdentifier....");
  String tenantId = _tenantIdentifier.get();
  if(tenantId == null)
   tenantId = DEFAULT_TENANT_ID;
  System.out.println("threadlocal tenant id ="+tenantId);
  return tenantId;

 public boolean validateExistingCurrentSessions() {
  return true;


Finally, the hibernate configuration for specifying that we are using hibernate mutli-tenancy, in hibernate.cfg.xml

That's it people,
Happy multi-tenanting :-)


Thursday, April 17, 2014

Using free openshift java PAAS, to run a web application on an online version of tomcat

Many times you have a war file, that you would like to deploy such that the application becomes available online.A classical use case is using a REST service as stub for your mobile app or thin client web app.

Openshift online PAAS is an easy solution for such free hotsitng of a java web application

Create a free account at open shift online 
Download their rhc command line tool
Fire following command to create 1 of 3 free apps, where name of new hosted web app is for example say, restgrails

rhc create-app restgrails tomcat-7

Go to the open shift web console and note down the ssh url of the git repository
setup ssh keys for the open shift git repo using command

rhc setup

If required, place the keys generated in <user-dir>\.ssh into <git-install-dir>\.ssh

fire following command on your local machine

git clone <git_url> restgrails

on your local machine the above git command will create a new directory called restgrails and checkout default web app from openshift git repo to your local machine

cd restgrails

Fire following commands to get rid of the default web app's src and pom.xml, since we will deploy a ready made war file onto the openshift PAAS

git rm -rf src/ pom.xml

git commit -am "deleted default source code"

Now prepare the ready made war file for deploying onto open shift

copy war file to restgrails/webapps/
rename war file to ROOT.war  --this is important if you want to map your webapp to the openshift online app URL

git add .
git commit -am "added new root.war to webapps"
git push --force

Above git push command, will trigger a deploy of the war file into the remote tomcat on open shift and soon the web app will become available online as say

you can tail the tomcat server logs with the following command
rhc tail restgrails

For databases, openshift online does support databases like mongodb, mysql and postgresql, so if your app uses hibernate, it can easily be configured to use online database like say mysql or postgresql, while running it online

Wednesday, March 26, 2014

Why is enterprise application development different from just programming?

Any monkey can write code that works!
Writing code that fulfills a functionality is a very important first step in the process of developing an enterprise application, but it is only the beginning, the enterprise application is far from done...

So what else is involved in developing an enterprise application?
As it turns out there are several other aspects of development that have to be considered:

  • Is the code unit tested with sufficient test coverage?
  • Is the code thread safe, in the face of multiple users accessing it?
  • Is the code transactional?
  • Is the code secure?
  • Is the code easy to debug? Is it instrumented with logging at appropriate levels?
  • Does the code handle exceptions and errors?
  • Is the code optimized for performance and for memory consumption?
  • Is the code resilient? Can it recover, from a failure?
  • Is the code easily extensible and maintainable?
  • Is the code portable across deployment environments?
  • Are all the configurable properties externalized and easily changed during deployment?
  • Is the code able to inter-operate with legacy systems to be integrated?

Answering all the above questions to the satisfaction of the non-functional requirements, is equally important. Many developers are aware of all the above aspects of development but completely ignore them, when asked about estimates when they think they can complete a particular functionality or feature of the enterprise application under development :-(

On a side note, even when we as developers decide to write our own reusable component or framework, please take into consideration that all the above aspects need to be built into the component/framework and hence many times it makes better sense to use an opensource component which is popular used or has withstood the test of time.Don't re-invent the wheel :-)

Taking an application from developers machine to production
Taking a functionally complete application from a developers machine upto production is a long walk. Lets see some of the common things that need to be done:

  • automated integration tests especially for regression testing
  • scripts for automated checkout, packaging and deployment
  • automated configuration of application  as per environment eg. integration, qc, uat and prod
  • scripts for application life cycle management - start, restart, stop, application
  • scripts/tools for application database cleanup and archival
  • scripts/tools for application log files cleanup and archival
  • scripts/tools for monitoring application health and sending alerts
  • scripts/tools for scrapping application logs for exceptions, etc and sending out alerts
  • scripts/tools for performance and memory monitoring of application
  • migrating existing database to new application database + testing app against migrated data
  • scripts for scheduling house keeping work around the application

As can be easily seen, there is a very substantial and important component of "operations" which needs to be undertaken, for successful implementation of any application. All these activities can hardly be classified as typical coding. These need to be factored into time estimates as well as workforce planning

Tuesday, March 25, 2014

When and why to use NODE.JS

Node.js ! 

The new kid on the block, has definitely been turning main stream. So, you must have heard buzzwords like async io, server side javascript, etc getting associated with the node hype. Some remarks from the sceptics as well - you cant handle dynamic javascript with your immature development workforce or node.js is ok for hello-world apps but not necessarily for primetime on the enterprise applications landscape.

Well heres my attempt clearing up the mist, with regards to node.js and related frameworks...

When to use Node.js

As very rightly pointed out by the authors of node.js, it is an excellent fit for DIRT-y applications. DIRT stands for Data Intensive, Real Time.

How do Java EE web apps work

Lets start out with a traditional Java EE web applications hosted on say a tomcat container, when a 1000 users concurrently access the web application, for each user request a thread is spawned by the web container. Each thread, does processing, passing control through, various application layers like view, controller, service, data access to finally hit a database or backend, get the relevant data and bring it back to presentation tier from where it may be formatted like an html response and sent back to the client browser as an Http response. The hundreds/thousands of threads running in the web container, ensure scalability of the application. But each thread contains synchronous method calls. Each thread is held up, while the sync call completes.The best to emphasize this is through a database call such as
ResultData data = personDao.updatePersonDetails(person);
 ( above call blocks till all DB rows are fetched )

it is not uncommon for many threads in such an application to be waiting for IO either disk io or network io (as in a database call). As a result one can clearly observe that hardly 10%-20% of cpu is utilized on the web tier, most latency is due to threads waiting on IO.

Can the above latency be reduced?

Enter Node.js with its aysnc io

Node.js at its simplest and most fundamental level prefers non-blocking function calls, through incessant use of callbacks. For example the above blocking database call can be simplified logically to the following async call:
personDao.updatePersonDetails(person, function(err, numRowsAffected ){....} );
Above call issues request to database driver for update but does not wait for results, it continues to next line of execution, and when the database update has been made, the callback i.e. the anonymous functiona gets executed, returning back the error if any and the number of rows affected by the updated.

Though the above difference between using sync blocking functions v/s using async functions with callbacks may seem trivial, if we adopt the async non-blocking function calling as the default way of programming soon things start adding up.

Instead of having 1 thread dedicated for an http request and having thousands of such threads, in node.js we have async functions, which do processing in bits and dont hold up a single thread. The node.js model starts exhibiting more scalability, due to non-blocking nature of calls in the processing stack.

Where are the benchmarks?

Well, consider this, apache webserver uses the multi threaded model whereas nginx uses non-blocking IO, it has been proven several benchmarks that nginx with its non-blocking io shows roughly 20% better scalability than apache webserver. So even for serving static content, we know, async io is always going to be more scalable.

Can Node.js leverage multiple cores? Is it fault tolerant?

This was another valid criticism raised against node,js platform in early days of the framework. We need to understand that node.js server has only a single thread available for the application' usage (there may be others for doing the server's internal housekeeping). Due to the above limitation, node.js processing runs on a single core and is also susceptible to outage since one thread crashing can bring processing down for all clients. Thankfully multiple node.js servers can be run on the same machine with a load balancer upfront. Or node.js modules like "cluster", "upstart" and "forever" ensure, node.js servers can be horizontally scaled and will have features like guaranteed uptime and automatic restarts, on crash.

Node.js for realtime apps

Out of the acronym DIRT, we already saw how node.js is a good fit for "data intensive" applications, well its time to talk about one more category of applications, namely "realtime applications" for which node.js would be suitable.
Again lets consider, traditional web apps, which require to show data that changes frequently at the backend. The traditional approach using Http is for browser based javascript clients to "poll" the server using "auto-refresh" tags. This approach apart from loading up the server is also very unoptimized, instead newer full-duplex protocols like websocket, can be used by the servers to push content, messages, notifications to the client browsers, "whenever" the server events occur.

This event driven approach, to pushing from server to client is very compatible with node.js async and non-blocking APIs, hence there is excellent support in node.js based modules like, for server push. This makes node.js very suitable for web applications that require high interactivity, server push and notifications.

What is node.js and how useful is it to build real life web apps from scratch?

Well node.js core is really a http server and a platform for hosting web apps. The core node.js hence does not include any features which directly are used in the making of a web application. To create a real life web application, one needs a framework with many features like ability to host static content from the server's native file system, user authentication, session and cookie management, templatable views, nagivation and routing framework, binding model and views, error handling, request body parsing etc.
For providing all above functionalities the core node.js needs to have relevant modules added on top of it
Node.js provides a robust module system called npm stands for Node package manager, which helps in installing such node.js modules.
The most critical module which provides pluggable middle-ware is a module called connect.
On top of connect is a module called express.js which provides a full fledged web application framework.
More custom requirements for the web app can be fulfilled by installing appropriate modules freely available via the npm (node package manager).
Similarly, templating engines like jade and ejs can be plugged into frameworks like express.js, thereby enriching the node.js ecosystem for your web application development.

In subsequent articles, I will be walking through developing a basic node.js web application for doing CRUD (create, read, update, delete) into a mongo database. You will be pleasantly surprised how easy and elegant it is to develop web applications in node.js based frameworks.