Apache Spark – Big Data Platform for All

Apache Spark is a powerful open source in-memory cluster computing framework built around speed, ease of use, and sophisticated analytics. It runs everywhere – Hadoop (YARN), Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, S3 and more. Spark powers a stack of high-level tools including Spark SQL, MLlib for machine learning, GraphX for graph processing, and Spark Streaming to build scalable fault-tolerant streaming applications. These can also be combined seamlessly in an application.

Spark is engineered from the bottom-up for performance, running 100x faster than Hadoop MapReduce by exploiting in memory computing and other optimizations and it excels at iterative computation. Currently, it’s a top-level Apache project and among the most active ones as well.

Spark is written using Scala but its API comes in many flavour: Scala, Java, Python and now R.

With recently released Data Frame API it brings simplicity to distributed big data processing for everyone. This API is inspired by native data frames in R and Python (Pandas), but designed from the ground-up to support modern big data and data science applications. As an extension to the existing API, DataFrames has ability to scale from kilobytes of data on a single laptop to petabytes on a large cluster. For new users familiar with data frames in programming languages like R and Python, this API should make them feel at home.

With the support of Data Frame and with many other simple yet powerful offerings, Spark has a potential to become the de facto platform for data scientists, analysts and developers to play around big data.

Sumit Mund

Data Solution Architect with more than 15 years of hands-on experience. He has an MSc by Research (in AI) degree and B.Tech degree in Information Technology. He is also a part-time PhD scholar at the University of Huddersfield where his research area includes applications of AI in Finance, particularly in Risk Management (Hedging). Read More...

Leave a Reply

Close Menu