Assignment 1

Basic Statistical Analysis

Due Date: Wednesday, February 16, 11:59:59PM

Value: 50 points
(5 points for following CMSC 210 Coding Standards,
45 points for overall design, functionality, and completeness)

Collaboration: For Assignment 1, collaboration is not allowed; you must work individually. You may still see your TA and come to office hours for help, but you may not work with any other CMSC 210 students. You may post questions on Discord, but you may not post code.

Github Assignment Invite Link: https://classroom.github.com/a/4wCtxJyn

Github Classroom: https://github.com/umbc-cmsc-210-spring-2022

Objectives

To learn to submit assignments through Github classroom
To refresh your memory regarding Python syntax and proper program design
To become familiar with CMSC 210 Python coding standards

Instructions

Naming

Your assignment file must be called assignment1.py.

Comments & Coding Standards

Include a complete file header comment at the top of your file.
Include a complete function header comment at the top of each function, just below the function header.
Make sure to follow the CMSC 210 Coding Standards.

The Task

This assignment deals with some basic descriptive statistics; specifically, mean, median and standard deviation. To compute the standard deviation, you calculate the difference between each value and the mean (the variance), and square it. Then you find the average of all of these squared variances and take the square root of it.

The data that you will be working with is real. It is the weight and brain data from 28 animal species.

Your program will read in one filename supplied at the command line and produce a variety of statistics based on the data present.

Test Data

You are provided with test data files containing final CMSC 201 grades as integer values, all of which are greater than or equal to zero. The files are what are called comma-separated value or .csv files. That is, they consist of values that are separated from each other, or delimited, by commas. These files will be included in your git template.

Pre-defined Python Library Usage

You will need to import the Python pre-defined math library for this assignment in order to use the sqrt() (square root) function.

You may also import the Python statistics library if you wish. You can use the mean(), median(), and stdev() functions for the purpose of checking the correctness of your functions (i.e. that they get the same results). But remove the calls to these functions and the import statistics statement before turning in your assignment. For this assignment, you may not import any other Python libraries.

User Input

For this assignment, you may assume that the user will enter valid filenames into prompts for input.

If the user enters a different type of data than you asked for (for example, a file that isn't present), your program may crash. This is acceptable.

Sample Output

The sample output is available as a separate file in your project template. The format of your output does not have to exactly match the sample output, but it should be similar and neat.

Program Design Notes

You are not required to turn a design in for this assignment. However, you will benefit greatly from taking the time to do a proper one!

In CMSC 201, we discussed top-down design. That is, beginning with your main function’s design and breaking it down into smaller and smaller pieces (functions). This is a good approach. The bulleted items below should be of help. Before coding, take the time to thoroughly think through the program logic.

Understand what the inputs to and the outputs from the overall program are.
Then, pseudo-code the program’s main() function. This will help you to understand the program’s logic at a high level and help you to decide on the functions to implement.
Draw a functional hierarchy chart of the functions that you intend to implement. Include the inputs to and outputs from each.
Think about how to separate program input, processing, and output code.
Look for duplicate code. This could be a sign that a block of code should be implemented as a function.
Look for code that, while only called once, should be pulled out as a separate function. Think if making the block into a function would simplify your code or if you or someone else might want to use that block of code in another program. Reuse does not simply mean within the same program!
Think about your statistical functions. Can any of them use others?

Implementation and Testing Notes

Take an incremental approach to the implementation and testing of your program. That is, do not use the “big bang” approach of implementing all or large parts of your program before you test it. Test as you go!

A top-down approach can also be taken when implementing and testing your program. (Some people prefer bottom-up, but top-down is recommended.) The bulleted items below should be of help.

After you have finished your program design, implement your main() function. You’ll need to create function “stubs” for any functions that main() calls. Have the stubs simply print a message such as, “Function X called.” You may also need return statements – just return any constant of the appropriate type.
Test main() to make sure that there are no syntax errors (of course!) and that its logic works as expected. Are all function calls working?
Now, implement and test each function one at a time. Begin with functions that deal with user and file input. After all, if these don’t work, the rest of your program certainly won’t!
Move on to functions that deal with output. You’ll need to be able to see if your program is computing the correct values as you build it up.
Last, implement and test the functions that perform computations and other tasks.

You may find that you need to adjust your program design as you implement. That’s natural. However, if you find yourself making major adjustments, you need to go back and rethink your overall design. Don’t worry – it happens!

Other Notes

If you find that your algorithm for a statistical function requires you to sort the list of data, the function must first copy the data to a temporary list before sorting. That way, when the function returns, the original list sent into the function has not been corrupted.

If you do find that you need to sort, you may use the Python built-in sort() or sorted() function. Look at your Python references (discussed in Lecture 02) for how to use this function. You may also need to use min() and/or max(), which is allowed.

As we have not yet discussed Program Assumptions (part of the file header comment), you may use the following exactly as written in your file header comment. Make sure that you read through the assumptions; they impact how you design your program.


Program Assumptions:
  - Each data file will be a comma-separated file with a single record of integer
       integer values >= 0
  - Each data file will contain > 2 values
  - Each file's data forms a normal distribution (i.e. unimodal).
  - The user will always enter at least one filename
  - The user will always enter a valid filename (i.e. the file exists)
  - The file will always have three columns
  - The first row will always be the names of the columns

Submitting Your Assignment

We will cover the use of Github in class and provide walk-throughs for submitting assignments.

In the mean time, you can start to develop locally. The source data can be downloaded from the Github project for the assignment.

Grading Rubric

Your computed values will be compared to the sample output OR the output of the statistics library for grading.

Following coding standards	10 points
Correctly reading the csv file	20 points
Correctly computing the median	20 points
Correctly computing the mean	20 points
Correctly computing the standard deviation	30 points