Monthly Archives: March 2015

Using R with Apache Spark

To use R with Spark we need SparkR package. Below are the steps needed to build, install and use SparkR package in a windows system.

Building SparkR in Windows

  • Make sure that you have installed R (version >3.1) and the path to bin is added in the system PATH variable
    e.g. C:\R\R-3.1.3\bin\x64
  • Download Rtools from below link
    http://cran.r-project.org/bin/windows/Rtools/
  • Select the components to install

Continue reading

Using IPython and Visual Studio with Apache Spark

To develop Apache Spark applications in IPython and Python tools for Visual Studio we need to set the environment variables PYTHONPATH to include the required library path for Spark.

Setting PYTHONPATH for Spark

  • Go to system properties and in advance tab click on environment variables.
  • Create a new system variable and name it as PYTHONPATH.
  • Add the below paths to the value field separated by semicolons (here c:\spark-1.3.0 is the path where spark installed)
    c:\spark-1.3.0\bin
    c:\spark-1.3.0\python
    c:\spark-1.3.0\python\lib\py4j-0.8.2.1-src.zip
  • Create another system variable and name it as SPARK_HOME. Set the value as the path of Spark installed directory.
    c:\spark-1.3.0

Continue reading