Quick notes on YARN (Hadoop 2.0)

Problems we had before YARN:

  • JobTracker is solely responsible for handling resources and tasks progress.
  • Scalability Limitation: Maximum cluster size is 4000
  • Maximum concurrent task is 40,000
  • On failure in one job execution: Kills the complete job queue. User needs to resubmit all the jobs.
  • Restarting is complex.
  • Low resource utilization because no flexibility in sharing and allocation of cluster resources.
  • Supports only map reduce. Other iterative application implemented using map reduce is very slower.

Row and Column (Cell) based security in SSAS Tabular Model

I am working on a BI System for a social care project in a local government authority where I need to achieve cell based security in SSAS Tabular Model.


  • One semantic model needs to be published for all the reporting/analytics needs of the project
  • No security is required for measures. So everyone (who ever has access to the cube) can see all the actual figures.
  • There are end users (reports consumers) and there is a separate reporting team who is responsible for building/publishing ad-hoc reports as the business needs. For some very sensitive records only few in the reporting team has view permission.
  • So on certain attributes of selected dimensions, security can be applied. E.g. If Person is a dimension then all the measures related to Person will be unaffected but if certain persons are restricted then every user can’t see their name and say Ethnicity of the persons and it’s OK to see the Person ID.
  • Performance can’t be severely compromised because of security.

SSAS Tabular Model Deployment

I recently defined deployment of SSAS tabular model for one of the projects I am working on. Here it goes.

Deployment Procedure

As development team won’t have any access to other environments like Test or PROD, all the deployable will be handed over to DBA team who can then make the deployment with the procedure described below.

For the sake of simplicity, the deployment folder is named as SSASDeploymentFolder here. This is symbolic and in real case it would be somewhere in the shared drive and will be accessible by both DBA and development team.

Installing Stinger Technical Preview in HDP 2.0 Sandbox

Yesterday I tried to install Stinger on Hortonworks HDP 2.0 Sandbox. Below are the steps I followed. I used the Sandbox 2 for Hyper-V.

Installing Stinger phase 3 preview

Import the sandbox 2 VM and make sure that it can access to the internet.

Start the VM and log into it using Alt+F5 keys. Download Stinger Quickstart Bundle using wget. Remember the url is case sensitive.

Big Data: A Revolution That Will Transform How We Live, Work and Think

A Revolution That Will Transform How We Live, Work and ThinkRecently I read a book, Big Data: A Revolution That Will Transform How We Live, Work and Think and find it really informative. I would recommend it to any one curious about big data and its impact. The book does not assume your technology background.

Below are some of the insights that the book provides:

Big data is one of the consequences of a change that is taking place now; the authors describes it as datafication – a concept which refers to taking information about all things under the sun – including ones we never used to think of as information at all, such as a person’s location, the vibration of an engine, or the stress on a bridge – and transforming it into data format to make it quantified. This allows us to use information in new ways, such as in predictive analysis: detecting that an engine is prone to a break-down based on the heat or vibration that it produces. As a result, we can unlock the implicit, latent value of the information.

