Category Archives: Uncategorized

Extracting Keywords Using Map/Reduce

In my last blog post I had discussed about using Map/Reduce to find co-authors in PubMed data on a HDP for windows. In this blog I will explain how to extract keywords from PubMed Abstracts. I am going to use the API provided by BjutCS on codeproject. The API basically extracts keywords based on entropy difference. For more details you can check this article on codeproject.

I have downloaded the PubMed data using NCBI Entrez Utilities Web Service. I am going to use only three information such as ID, Title and Abstract for extracting Keywords, so I stored these information in a tab delimited text file. Below is the screen shot of the file opened in MS Excel.

Continue reading

Installing & Configuring PowerView for SharePoint 2010/2013

After a little bit research on creating PowerView report in SharePoint, I am going to share my experience and list down prerequisites that you must configure with your system before going to create Power View reports.

I had following configuration at server side:

Continue reading

Finding Co-Authors using Map/Reduce

I was trying to write a map/reduce job for Hadoop using Visual Studio 2012 in a HDP for Window environment. In search for a suitable practical scenario I got some PubMed data from http://www.ncbi.nlm.nih.gov/pubmed, I decided to find the co-authors and the numbers of PubMed they published together for each individual author. Some thing like below:

PubMedArticle1               Authors {“A, B, X”}

PubMedArticle 2              Authors {“B, X, Y”}

PubMedArticle 3              Authors {“A, K”}

PubMedArticle 4              Authors {“M”}

Continue reading

Partnership with Hortonworks!

Partnership with HortonworksToday, Hadoop has been synonymous with big data as it has been the platform of choice for big data processing.Apache™ Hadoop® is an open source project governed by the Apache Software Foundation (ASF) that allows you to gain insight from massive amounts of structured and unstructured data quickly and without significant investment.

Hadoop started as an open source initiative (and still it is!) and soon it was adopted and nurtured by Yahoo! to support its web applications. Then many came forward to embrace it. According to Wikipedia, as of 2013, Hadoop adoption is widespread. For example, more than half of the Fortune 50 uses Hadoop.

Continue reading