Category Archives: Machine Learning

Apache Spark – Big Data Platform for All

Apache Spark is a powerful open source in-memory cluster computing framework built around speed, ease of use, and sophisticated analytics. It runs everywhere – Hadoop (YARN), Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, S3 and more. Spark powers a stack of high-level tools including Spark SQL, MLlib for machine learning, GraphX for graph processing, and Spark Streaming to build scalable fault-tolerant streaming applications. These can also be combined seamlessly in an application.

Continue reading

Understanding Sweep Parameters module in Azure Machine Learning

When you train a machine learning algorithm it is very important that you choose right set of parameters. When you don’t understand the in-side out of that algorithm it might be very difficult to choose and fine tune the parameters. Even if you understand the algorithm well, it might be daunting to run different iteration of training and evaluate a model with different combination of parameters – consider Neural Networks.

Microsoft Azure Machine Learning comes with a handy option to address the same with a module called Sweep Parameters. This module takes an untrained model along with training and validation data set and generates optimum parameter settings with just clicks.

Continue reading

Understanding Evaluate Model in Microsoft Azure Machine Learning

I think one of the coolest features of Azure Machine Learning is the ability to evaluate different algorithms and choose the right one with just few mouse clicks. The Evaluate Model makes it happen.

Official Documentation Page for the evaluate model can be found here.

Anyone can make sense of its output and decide on the right model provided one has basic understanding of the followings:

Continue reading