Monthly Archives: August 2013

Hadoop 2.0: YARN and REEF

While Hadoop 1.0 (the current distributions) is driving the world with increasing speed, Hadoop 2.0 has already made debut with a bigger promise of overcoming some of the limitations of Hadoop 1.0 like scalability, cluster utilisation, agility and data processing without Map Reduce.

Hadoop 1.0 does what it promises brilliantly. Map Reduce is like the backbone of Hadoop 1.0. It is very good for batch processing but not much of help for real time and near-real time processing. Again to make a job work, it has to be or converted to be a Map Reduce job. Map Reduce is great for certain types of works but does not fit for all. In terms resource management Map Reduce and Hadoop 1.0 does not guarantee 100% or effective utilisation.

Continue reading

Importing CSV file with double quotes using SSIS

Last week I was importing a bunch of CSV files to database using a SSIS package. What I found was that the CSV files were error free and as per my expectations except that few address information were inside double quotes and with commas. I did not know this as the CSV file size was huge and there were only few such instances. I discovered this only after I got some unexpected values in some DB columns.

Continue reading