Zeppelin comes with built in Apache Spark. Below are the steps to use Scala IDE to run and debug code for Apache Spark.
Below are the links to download to Spark and Zeppeline on Windows env
Sparklet (Apache Spark & Zeppelin Installer for Windows 64 bit)
Scala IDE
- Install and make sure Zeppelin and Spark running properly
- Run Scala IDE and select a Workspace
- Go to File -> New -> Scala Project and enter a project name and click Finish.
- Right click on the Project folder and go to Build Path -> Add External Archive
- In Jar Selection window go to the Sparklet installed folder. Then go to interpreter -> spark -> dep folder and select zeppelin-spark-dependencies-0.5.6-incubating.jar file. Click on Open to add it to the project.
- Right click on the src folder and select New -> Scala object
- Give a name and click Finish.
- Type the below code. The program counts the occurrence of each word and prints the top 15 words from readme.md file.[code language=”scala”]
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
object WordCount {
def main(args: Array[String]) {val conf = new SparkConf().setAppName(“Simple Application”).setMaster(“local[*]”)
val sc = new SparkContext(conf)
val textfile=sc.textFile(“c:/Sparklet/readme.md”)
val counts = textfile.flatMap(line => line.split(” “))
.map(word => (word, 1))
.reduceByKey(_ + _)
val top15=counts.sortBy(_._2, false)take(15)
top15.foreach(println)
}}
[/code]