• Skip to main content
  • Skip to primary sidebar

Damir Zunic

  • Home
  • Projects Portfolio
  • Certifications
  • Blog
  • About Me
  • Contact Me

Amazon Customer Vine Reviews

In this analysis, Google Colab notebooks were used to perform the ETL process completely in the cloud. This way I was able to run PySpark ETL commands and load two chosen large datasets of Amazon reviews (“Kitchen purchase reviews” and “Home purchase reviews”) into an AWS RDS PostgreSQL instance. SQL was then used to perform a statistical analysis of selected data.

Many of Amazon’s shoppers depend on product reviews to make a purchase. Amazon Vine program is an invitation-only club for a small percentage of elite most trusted reviewers, selected by Amazon. The program aims to provide customers with more information including honest and unbiased reviews.  Our task was to investigate whether Vine reviews are free of bias and if they are truly trustworthy. Amazon makes the review datasets publicly available. However, they are quite large and can exceed the capacity of local machines to handle. One dataset alone contains several million rows, and this can be quite challenging on the average local computer.

Copies of Colab notebooks and SQL query files contain detailed coding and additional descriptions.

  • Tools/techniques used: Apache Spark, PostgreSQL, Google Colab, AWS RDS

GitHub Repository

Primary Sidebar

Search

More results...

Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors

Follow Me

  • GitHub
  • LinkedIn

Recent Posts

  • Importance of Data Visualization
  • The Story About Confusion Matrix Visualization
  • My Data Journey

Contact me

  • Home
  • Projects Portfolio
  • Certifications
  • Blog
  • About Me
  • Contact Me

Copyright © 2020–2025 - damirzunic.com - Web Design By Damir Zunic