This course is part of learning path “Big Data Foundations – Level 2” offered by IBM Cognitive Class. To earn this badge and the certificate I had to satisfactory complete the exam.
The course introduced me to the fundamentals of Spark, an open source processing engine built around speed, ease of use, and analytics. It taught me how to use Resilient Distributed Datasets (RDD) and DataFrames to perform in-memory computing and create applications on top of the Spark built-in libraries.
Through this course, I learned the following about Spark:
- Spark is the way to go for large amounts of data that requires low latency processing that a typical MapReduce program cannot provide
- it performs at speeds up to 100 times faster than Map Reduce for iterative algorithms or interactive data mining
- it provides in-memory cluster computing for lightning fast speed
- it supports Java, Python, R, and Scala APIs
- it can handle a wide range of data processing scenarios by combining SQL, streaming and complex analytics together
- it runs on top of Hadoop, Mesos, standalone, or in the cloud and it can access diverse data sources (HDFS, Cassandra, S3)
Credentials: