The purpose of the project was to find two movie datasets and perform ETL (Extract, Transform, Load) on them to migrate them to a production database. We are keeping our data warehouse focused on the following subjects:
- does the size of a movie’s budget mean better user and critic ratings?
- does the size of a movie’s budget impacts Facebook likes?
We have compiled data on movies from two datasets from Kaggle. The smallest dataset contents a lot of Facebook information about movies, directors, and actors.
After extraction and transformation, the first dataset was loaded to MongoDB database collection. We have prepared special dictionaries from the dataset containing Facebook likes and then used them to update previously created collection.
ETL project notebook contains a detailed code for ETL.
- Tools/techniques used: Python, Jupyter Notebook, Pandas, MongoDB, PyMongo
GitHub Repository for this project.