Data Analysis Using Zeppelin on Windows

Tweet about this on TwitterShare on LinkedInShare on Google+Share on Facebook

We know that Apache Zeppelin is a web-based multipurpose note book. It provides an interactive Data Analysis and many more. such as Data Ingestion, Data Discovery, Data Visualization & Collaboration. In this post I will explore some basic data analysis using Zeppelin and Spark.

To enable Apache Spark and Zeppelin on Windows system you need to download and install the Sparklet on your windows system.

Here I am using the sales data (SampleData.csv) for my Data Analysis, which was also used in my previous Data Visualization blog post.

Below are the steps I am following and the code sample.

  1. Load Data File
    val csv = sc.textFile("C:/data/SampleData.csv")
    val headerAndRows = => line.split(",").map(_.trim))
    val header = headerAndRows.first
    val data = headerAndRows.filter(_(0) != header(0))
    val sampleData = => SampleData(
  2. Show the content of the DataFrame
  3. Count the number of orders
  1. Select only one column, e.g. “Item” column"Item").show()
  2. Access a Column, e.g. “OrderDate” column
    //select column
    //select multiple column, e.g “OrderDate” and “Total” column"OrderDate"), (sampleData("Total"))).show()
  3. Round figure of column value, e.g. “Total” column"OrderDate"), round(sampleData("Total"))).show()
  4. Filter column value, e.g. “Total” column greater than 1000
    sampleData.filter(sampleData("Total") > 1000).show()
  5. Count the number of orders by Region, e.g. “Region” column
  6. Register Data Frame as a Table
  7. Data Visualization technique
    SELECT * FROM sales

    sales bar chart

    SELECT Region, Item, Total FROM sales

    total item sales on region

    SELECT Region, round(sum(Total)) AS RegionalTotal FROM sales
    GROUP BY Region

    total item sales by region

    SELECT Region, Item, round(sum(Total)) AS RegionalTotal FROM sales
    GROUP BY Region, Item
    Order By Region

    total item order by region

I will keep exploring more analysis on Zeppelin and Spark in windows environment. Stay tuned!

2 thoughts on “Data Analysis Using Zeppelin on Windows

Leave a Reply

Your email address will not be published. Required fields are marked *

9 − three =