• Skip to main content
  • Skip to primary sidebar

Damir Zunic

  • Home
  • Projects Portfolio
  • Certifications
  • Blog
  • About Me
  • Contact Me

Big Data Analytics Using Spark

This course is part of the Data Science MicroMasters program provided by University of California San Diego. To earn the course certificate, I had to successfully complete eight assignments and pass the proctored exam.

The course taught me the following:

  • how to perform statistical analysis of very large datasets that do not fit on a single computer
  • some of the most popular tools for performing this type of analysis:
    • Apache Spark using Pyspark
    • XGBoost
    • TensorFlow.
  • how to use Spark to minimize bottlenecks in massive parallel computation and to understand underlying computer architecture and the programming abstractions
  • how to perform supervised an unsupervised machine learning on massive datasets using the Machine Learning Library (MLlib)
  • how to perform data loading and cleaning using Spark and Parquet
  • how to model data through statistical and machine learning methods to:
    • perform large scale analysis
    • identify statistically significant patterns
    • visualize statistical summaries.
  • how to use these tools through Jupyter Notebooks combining narrative, code and graphics to create convincing analytical documents

Credentials:

  • Certificate

Primary Sidebar

Search

More results...

Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors

Follow Me

  • GitHub
  • LinkedIn

Recent Posts

  • Importance of Data Visualization
  • The Story About Confusion Matrix Visualization
  • My Data Journey

Contact me

  • Home
  • Projects Portfolio
  • Certifications
  • Blog
  • About Me
  • Contact Me

Copyright © 2020–2025 - damirzunic.com - Web Design By Damir Zunic