Using IPython and Visual Studio with Apache Spark

To develop Apache Spark applications in IPython and Python tools for Visual Studio we need to set the environment variables PYTHONPATH to include the required library path for Spark.

Setting PYTHONPATH for Spark

  • Go to system properties and in advance tab click on environment variables.
  • Create a new system variable and name it as PYTHONPATH.
  • Add the below paths to the value field separated by semicolons (here c:\spark-1.3.0 is the path where spark installed)
    c:\spark-1.3.0\bin
    c:\spark-1.3.0\python
    c:\spark-1.3.0\python\lib\py4j-0.8.2.1-src.zip
  • Create another system variable and name it as SPARK_HOME. Set the value as the path of Spark installed directory.
    c:\spark-1.3.0

Standalone Spark program in IPython

Run the IPython shell or IPython Notebook and type below code. The code is for word counts of a file in Spark standalone mode.
[code language=”python”]
from operator import add
from pyspark import SparkContext

sc = SparkContext(appName=”PythonWordCount”)

lines = sc.textFile(“c:/spark-1.3.0/CHANGES.txt”) # path to a text file in local file system

counts = lines.flatMap(lambda x: x.split(‘ ‘)).map(lambda x: (x, 1)).reduceByKey(add)

output = counts.collect()

for (word, count) in output:

print “%s: %i” % (word, count)

sc.stop()
[/code]

Standalone Spark program in Visual Studio

To develop python programs in Visual Studio you need to install Python tools for Visual Studio. Below is the link to download
https://pytools.codeplex.com

Follow the below steps to run a Standalone Spark program in Visual Studio

  • Create a new Python Application in Visual Studio
  • In solution explorer right click on search path and Add PYTHONPATH to Search Path

Add PYTHONPATH to Search Path in visual studio (spark python)

  • Type the code given above and run. If you are using Python Interactive then you need to reset it.

Python Spark wordcount program in visual studio

This Post Has 2 Comments

  1. it doesn’t work…
    unable to resolve “pyspark”

  2. Thanks for this post as I was trying to make python work with spark on visual studio for a while and this post helps me do it in a sec. For info, I am using Visual Studio community 2015
    Works for me!!
    🙂

Leave a Reply

Close Menu