CS239-1 Homework Assignment #1 - April 12, 2007

Homework is due at the beginning of class, Thursday, April 19, 2007

Because these exercises are designed to familiarize you with important statistical concepts, using a statistical program, spreadsheet, or advanced calculator to determine means and standard deviations is OK, but do not use them to calculate confidence intervals or linear regressions. Show your work in all calculations (again, you need not show details of how you calculated means and standard deviations), and list the row and column used in all table lookups.

  1. The star player on UCLA's men's basketball team this year was Arron Afflalo. Below is a list of the number of points he scored in each game this season.

        9    14    14    19    13    13
       12    17    13    14    14    15
       13    22    25    22    17    16
       16    27    24    15    13    20
       14    12     3    22    10    17
       24    17
    
    What was the mean number of points Afflalo scored in games this year? What was the standard deviation of his scoring average?

    Did Afflalo's scores fit the normal distribution? Show how you made this determination.

  2. The Conquest file system was designed to improve the performance of file accesses. Various metrics can be used to judge its success. One that was used in some tests was how quickly a file system could write from application code to a single large file, measured in Mbytes/second. For this metric, higher is better.

    Below is a set of data actually gathered from the Conquest file system (labelled CFS here) and another file system labelled RAMFS on one particular test. The experimental conditions were identical for both systems. Each entry on each line represents the number of Mbytes/second of sequential writes for a single test for each of the two systems.

         CFS          RAMFS
         ---          -----
    
        324536.1     305084.7
        323036.4     304907.2
        319687.8     301141.9
        322539.5     282254.6
        322738.1     302881.6
        320273.7     298739.6
        323135.9     297215.4
        321550.4     282558.9
        320469.4     282558.9
    

    Calculate the mean and standard deviation of each system's performance on this benchmark. Determine, for each, if the data appears to be normally distributed, explaining your conclusion.

    Assume, for the moment, that both data series are normally distributed, regardless of your previous answer. Compare the two file system performances. Is one file system better than the other at a 90% confidence interval? At a 95% confidence interval? Demonstrate why, in each case.

  3. For each of the following situations, consider the proper index of central tendency and index of dispersion to use in evaluating the metric and system described.

  4. The Ficus file system allowed multiple replicas of a file to be stored on different machines. When one replica was updated, an instant attempt was made to propagate the update to all the other replicas. Thus, a write on a Ficus file was potentially much more expensive than a write to a normal file, especially if one considered the time being measured as starting at the moment of the write and ending when all replicas had received the update.

    ficusdata.xls contains a spreadsheet of the amount of real elapsed time (measured as described above) that was required to perform a very large write to a file when different numbers of replicas of the file were stored, from one to seven. Several trials were performed for each number of replicas.