Assignment 1

Basic Statistical Analysis


Due Date: Wednesday, February 16, 11:59:59PM

Value: 50 points
(5 points for following CMSC 210 Coding Standards,
45 points for overall design, functionality, and completeness)

Collaboration: For Assignment 1, collaboration is not allowed; you must work individually. You may still see your TA and come to office hours for help, but you may not work with any other CMSC 210 students. You may post questions on Discord, but you may not post code.

Github Assignment Invite Link: https://classroom.github.com/a/4wCtxJyn

Github Classroom: https://github.com/umbc-cmsc-210-spring-2022

Objectives

Instructions

Naming

Your assignment file must be called assignment1.py.

Comments & Coding Standards

The Task

This assignment deals with some basic descriptive statistics; specifically, mean, median and standard deviation. To compute the standard deviation, you calculate the difference between each value and the mean (the variance), and square it. Then you find the average of all of these squared variances and take the square root of it.

The data that you will be working with is real. It is the weight and brain data from 28 animal species.

Your program will read in one filename supplied at the command line and produce a variety of statistics based on the data present.

Test Data

You are provided with test data files containing final CMSC 201 grades as integer values, all of which are greater than or equal to zero. The files are what are called comma-separated value or .csv files. That is, they consist of values that are separated from each other, or delimited, by commas. These files will be included in your git template.

Pre-defined Python Library Usage

You will need to import the Python pre-defined math library for this assignment in order to use the sqrt() (square root) function.

You may also import the Python statistics library if you wish. You can use the mean(), median(), and stdev() functions for the purpose of checking the correctness of your functions (i.e. that they get the same results). But remove the calls to these functions and the import statistics statement before turning in your assignment. For this assignment, you may not import any other Python libraries.

User Input

For this assignment, you may assume that the user will enter valid filenames into prompts for input.

If the user enters a different type of data than you asked for (for example, a file that isn't present), your program may crash. This is acceptable.

Sample Output

The sample output is available as a separate file in your project template. The format of your output does not have to exactly match the sample output, but it should be similar and neat.

Program Design Notes

You are not required to turn a design in for this assignment. However, you will benefit greatly from taking the time to do a proper one!

In CMSC 201, we discussed top-down design. That is, beginning with your main function’s design and breaking it down into smaller and smaller pieces (functions). This is a good approach. The bulleted items below should be of help. Before coding, take the time to thoroughly think through the program logic.

Implementation and Testing Notes

Take an incremental approach to the implementation and testing of your program. That is, do not use the “big bang” approach of implementing all or large parts of your program before you test it. Test as you go!

A top-down approach can also be taken when implementing and testing your program. (Some people prefer bottom-up, but top-down is recommended.) The bulleted items below should be of help.

You may find that you need to adjust your program design as you implement. That’s natural. However, if you find yourself making major adjustments, you need to go back and rethink your overall design. Don’t worry – it happens!

Other Notes

If you find that your algorithm for a statistical function requires you to sort the list of data, the function must first copy the data to a temporary list before sorting. That way, when the function returns, the original list sent into the function has not been corrupted.

If you do find that you need to sort, you may use the Python built-in sort() or sorted() function. Look at your Python references (discussed in Lecture 02) for how to use this function. You may also need to use min() and/or max(), which is allowed.

As we have not yet discussed Program Assumptions (part of the file header comment), you may use the following exactly as written in your file header comment. Make sure that you read through the assumptions; they impact how you design your program.


Program Assumptions:
  - Each data file will be a comma-separated file with a single record of integer
       integer values >= 0
  - Each data file will contain > 2 values
  - Each file's data forms a normal distribution (i.e. unimodal).
  - The user will always enter at least one filename
  - The user will always enter a valid filename (i.e. the file exists)
  - The file will always have three columns
  - The first row will always be the names of the columns

Submitting Your Assignment

We will cover the use of Github in class and provide walk-throughs for submitting assignments.

In the mean time, you can start to develop locally. The source data can be downloaded from the Github project for the assignment.

Grading Rubric

Your computed values will be compared to the sample output OR the output of the statistics library for grading.

Following coding standards 10 points
Correctly reading the csv file 20 points
Correctly computing the median 20 points
Correctly computing the mean 20 points
Correctly computing the standard deviation 30 points