To use R with Spark we need SparkR package. Below are the steps needed to build, install and use SparkR package in a windows system.
Building SparkR in Windows
- Make sure that you have installed R (version >3.1) and the path to bin is added in the system PATH variable
- Download Rtools from below link
- Select the components to install
To develop Apache Spark applications in IPython and Python tools for Visual Studio we need to set the environment variables PYTHONPATH to include the required library path for Spark.
Setting PYTHONPATH for Spark
- Go to system properties and in advance tab click on environment variables.
- Create a new system variable and name it as PYTHONPATH.
- Add the below paths to the value field separated by semicolons (here c:\spark-1.3.0 is the path where spark installed)
- Create another system variable and name it as SPARK_HOME. Set the value as the path of Spark installed directory.