Needless to mention that Apache Spark is becoming the de facto platform for big data analytics. At the same time there is a notebook revolution going on. Data scientists and others who use a notebook simply love it. A notebook provides a browser based interactive environment to write and execute code, view output, make plots and many more. IPython Notebook is no doubt leading this revolution but it only allows python code.
Apache Zeppelin is a new entrant to the league. It enables interactive data analytics. One can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more. Zeppelin is based on the concept of an interpreter that can be bound to any language or data processing backend. Basically, Zeppelin is a web based notebook server. Its backend already supports quite a few interpreters like Spark, Scala, Python, Hive, Markdown etc and many more are yet to come. That means from a single notebook you can work with different big data platform and build your analytics solution. Zeppelin tends to cater all your needs: Data Ingestion, Data Discovery, Data Analytics, Data Visualization & Collaboration. It comes with Spark/Scala as its default interpreter.
When we started using Zeppelin we just loved it. Though it is still in its the early days, it promises quite a lot. As we work mostly in Windows environment we built Zeppelin and Spark and prepared an installer so that anyone in our team can set it up with just couple of clicks. Then we realised, if the installer is so useful for us, others might also find it useful. So today, we are making it available for everyone to download from our website. With this beta release of the standalone installer, the distribution which we call Sparklet includes Spark 1.6 and Zeppelin 0.5.6. Spark, Scala and Spark SQL works well with this release and we will make the all other interpreters work in the upcoming releases.
Sparklet can be downloaded from the following URL which also includes a link to the detailed user guide to download and install.
Here is a link to the video where the creator of Zeppelin explains how Spark with Zeppelin can be used for complete data science/advanced analytics life cycle:
We would encourage you to try out Sparklet if you are looking at using Spark/Zeppelin in Windows environment. Please let us know what you think by dropping an email with the subject line, “Sparklet Feedback” to firstname.lastname@example.org.
This Post Has One Comment
Sthitaprajna Sahoo7 Apr 2016
Hello – Any way to connect this tool to a Hadoop cluster and access the other objects like Hive tables , HDFS files etc ?