We know that Apache Zeppelin is a web-based multipurpose note book. It provides an interactive Data Analysis and many more. such as Data Ingestion, Data Discovery, Data Visualization & Collaboration. In this post I will explore some basic data analysis using Zeppelin and Spark.
To enable Apache Spark and Zeppelin on Windows system you need to download and install the Sparklet on your windows system.
Here I am using the sales data (SampleData.csv) for my Data Analysis, which was also used in my previous Data Visualization blog post.
Below are the steps I am following and the code sample.
- Load Data File
val csv = sc.textFile("C:/data/SampleData.csv")
val headerAndRows = csv.map(line => line.split(",").map(_.trim))
val header = headerAndRows.first
val data = headerAndRows.filter(_(0) != header(0))
val sampleData = data.map(p => SampleData(
p(0),
p(1),
p(2),
p(3),
p(4),
p(5),
p(6)
)).toDF()
- Show the content of the DataFrame
sampleData.show()
- Count the number of orders
sampleData.count
- Select only one column, e.g. “Item” column
sampleData.select("Item").show()
- Access a Column, e.g. “OrderDate” column
sampleData("OrderDate")
//select column
select(sampleData("OrderDate")).show()
//select multiple column, e.g “OrderDate” and “Total” column
sampleData.select(sampleData("OrderDate"), (sampleData("Total"))).show() - Round figure of column value, e.g. “Total” column
sampleData.select(sampleData("OrderDate"), round(sampleData("Total"))).show()
- Filter column value, e.g. “Total” column greater than 1000
sampleData.filter(sampleData("Total") > 1000).show()
- Count the number of orders by Region, e.g. “Region” column
sampleData.groupBy("Region").count().show()
- Register Data Frame as a Table
sampleData.registerTempTable("sales")
- Data Visualization technique
%sql
SELECT * FROM sales
%sql
SELECT Region, Item, Total FROM sales
%sql
SELECT Region, round(sum(Total)) AS RegionalTotal FROM sales
GROUP BY Region
%sql
SELECT Region, Item, round(sum(Total)) AS RegionalTotal FROM sales
GROUP BY Region, Item
Order By Region
I will keep exploring more analysis on Zeppelin and Spark in windows environment. Stay tuned!
sathya g
7 May 2016thank you for sharing the article with us. please keep on updating more useful informations…