Assignment 3

Ramen Time


Due Date: Wednesday, March 16th, 11:59:59 PM

Value: 150 points

Github Invite: https://classroom.github.com/a/tImg4OVb

Collaboration: For Assignment 3, collaboration is not allowed; you must work individually. You may still see your TA and come to office hours for help, but you may not work with any other CMSC 210 students. You may post questions on Slack, but you may not post code.

Objectives:

Instructions

The Task

In this assignment, you have been given two datasets. The first is a dataset from The Ramen Rater, a product review website for the hardcore instant ramen enthusiast (or "ramenphile"), with over 2500 reviews to date. This file is located in the project repo at notebooks/ramen-ratings.csv. Each record in the dataset is a single ramen product review. The columns—Brand, Variety (the product name), Country, and Style (Cup? Bowl? Tray?)—are pretty self-explanatory. The "Stars" column indicates the ramen rating, as assessed by the reviewer, on a 5-point scale.

In addition to this dataset, there is also a CSV file (notebooks/noodle-consumption) which comes from WINA, the World Instant Noodles Association, which lists the global demand, by country, for instant noodles in 2015, 2016, 2017, 2018, and 2019. The unit in this table is "Millions of Servings"

In this assignment you will be asked to produce a Jupyter Notebook that provides answers to the following questions by using Python and Pandas. If a data visualization is requested, use Seaborn for this.

  1. What brand, on average, has the highest ratings for instant ramen? And what is that average rating?
  2. What country, on average, has the highest ratings for instant ramen? And what is that average rating?
  3. If I want the best chance of getting good ramen, which of the following style of product—"Cup", "Pack", "Tray", or "Bowl Style"—should I pick? (Assume I don't know anything else about the product.) Create a box plot that demonstrates this empirically.
  4. Are ramen ratings normally distributed? Demonstrate your answer graphically.
  5. Do you think there is any correlation between a country's demand for instant noodles and the quality of its instant ramen? Demonstrate your answer graphically.
  6. Extra credit: Does having the word "spicy", "hot", or "chili" in the product name (the "Variety" column) make it more likely that the rating will be higher?

Pre-defined Python Library Usage

You may import anything. At all. May the force of all of Python be with you. You will definitely need to use Seaborn, Pandas, and Jupyter Notebooks. Installation instructions can be found in the project's README.

Submitting Your Assignment

We will cover the use of github in class and provide walkthroughs for submitting assignments. The github project already contains a Jupyter notebook called "Ramen-Time.ipynb". Please do all your work in there and remember to save the notebook contents in Jupyter Lab.

Coding Standards

Because we are using a notebook rather than a traditional Python file, some sections of the coding standards do not apply here. You can ignore:

However, for each of the five required questions in the assignment, include a markdown cell in the notebook that explicitly answers the question. For example:

Question 2:
The country that has the highest average rating is XYZ. The rating is 9.99.

This cell should immediately follow the Python code (and visualization, if any) you created to answer the question.

Visualization Standards

Make sure that any labels, legends, etc. in your charts are reasonable. Good rule of thumb: if you were including this chart in a research paper, how would you label this data?

Grading Rubric

Following coding standards 20 points
Correctly answering question 1 20 points
Correctly answering question 2 20 points
Correctly answering question 3 30 points; 15 for the answer, 15 for the chart
Correctly answering question 4 30 points; 15 for the answer, 15 for the chart
Correctly answering question 5 30 points; 15 for the answer, 15 for the chart
Extra credit question 20 points