Due Date: Wednesday, March 16th, 11:59:59 PM
Value: 150 points
Github Invite: https://classroom.github.com/a/tImg4OVb
Collaboration: For Assignment 3, collaboration is not allowed; you must work individually. You may still see your TA and come to office hours for help, but you may not work with any other CMSC 210 students. You may post questions on Slack, but you may not post code.
In this assignment, you have been given two datasets. The first is a dataset from
The Ramen Rater, a product review website for the hardcore instant ramen
enthusiast (or "ramenphile"), with over 2500 reviews to date. This file is located in the project repo at
notebooks/ramen-ratings.csv
. Each record in the dataset is a single ramen product review.
The columns—Brand, Variety (the product name), Country, and Style (Cup? Bowl? Tray?)—are pretty self-explanatory.
The "Stars" column indicates the ramen rating, as assessed by the reviewer, on a 5-point scale.
In addition to this dataset, there is also a CSV file (notebooks/noodle-consumption
) which comes
from WINA, the World Instant Noodles Association,
which lists the global demand, by country, for instant noodles in 2015, 2016, 2017, 2018, and 2019. The unit in this
table is "Millions of Servings"
In this assignment you will be asked to produce a Jupyter Notebook that provides answers to the following questions by using Python and Pandas. If a data visualization is requested, use Seaborn for this.
You may import anything. At all. May the force of all of Python be with you. You will definitely need to use Seaborn, Pandas, and Jupyter Notebooks. Installation instructions can be found in the project's README.
We will cover the use of github in class and provide walkthroughs for submitting assignments. The github project already contains a Jupyter notebook called "Ramen-Time.ipynb". Please do all your work in there and remember to save the notebook contents in Jupyter Lab.
Because we are using a notebook rather than a traditional Python file, some sections of the coding standards do not apply here. You can ignore:
However, for each of the five required questions in the assignment, include a markdown cell in the notebook that explicitly answers the question. For example:
Question 2:
The country that has the highest average rating is XYZ. The rating is 9.99.
This cell should immediately follow the Python code (and visualization, if any) you created to answer the question.
Make sure that any labels, legends, etc. in your charts are reasonable. Good rule of thumb: if you were including this chart in a research paper, how would you label this data?
Following coding standards | 20 points |
Correctly answering question 1 | 20 points |
Correctly answering question 2 | 20 points |
Correctly answering question 3 | 30 points; 15 for the answer, 15 for the chart |
Correctly answering question 4 | 30 points; 15 for the answer, 15 for the chart |
Correctly answering question 5 | 30 points; 15 for the answer, 15 for the chart |
Extra credit question | 20 points |