CS239-1 Homework Assignment #3 - April 26, 2007

Homework is due at the beginning of class, Thursday, May 3, 2007

Consider the following situations. In each, what mistake did the person performing the evaluation make?
A. A LAN's ability to handle different levels of voice-over-IP traffic is being tested. The workload used generates packets at appropriate times to emulate typical VoIP traffic patterns for telephone calls. The applied bandwidth of the workload is controlled by increasing the size of the packets.
B. The performance of a remote file server is being tested. A server is loaded with test files of various sizes matching the expected set of files that will actually be stored, organized into directories in ways that seem realistic. The workload applied is a set of requests for test files. Each request is applied to a test file chosen from a distribution derived from traces of similar file servers. The type of each request (R,W,E,directory traversal, etc.) is chosen based on a distribution derived from the same traces.
C. A distributed system serving a typical office environment has a dedicated server to hold all system log data generated by all other machines in the system. These log entries include records of all logins, execution of all programs, remote sessions set up with external machines, file system activities, and failures and other unexpected events. The only activities run on the dedicated machine are related to the logging - accepting log messages, writing data into the logs, compressing the logs during idle periods, producing daily log analysis reports, and running backups overnight. Having measured the performance characteristics of all processes that run on this server, an analyst plans to test if the server can handle logging for ten times as many machines as currently deployed. He plans to run the testing machine in a standalone mode. Another machine will create remote logging requests to send to the test machine, generating all characteristics of the log requests (timing, size, type, etc.) based on the observed distributions seen in the real network. A load generator on the test machine will start all other characteristic processes on the test machine at random intervals, the probabilities derived from the tests run on the real machine.
D. A company whose business is renting time on its computers for kids to play videogames in its location is considering replacing its wired network with a wireless network. Since some of the popular games require certain network guarantees of loss rate of packets to be playable, the company is concerned whether the wireless network will achieve sufficient performance. A loss rate of more than 3% is bad for many kinds of games. An analyst measures the loss rate of the proposed wireless network in the actual location. He reports that the average loss rate is 2%, and suggests the wireless network will do fine for this use.
Seer is a file hoarding system. It downloads replicas of files onto a portable computer while the computer is connected to its home network, with the goal of ensuring that the portable computer will have stored all the files its owner needs to use while it is disconnected from the network. The basic method used is to observe the user's activity, deduce relationships between files, and store all files required to perform certain high level activities (like working on a paper or compiling a program) that the user seems likely to perform in the near future.
A. What metrics should be used to evalute Seer?
B. What workload should be applied to Seer to test its performance?
C. What instrumentation will be required to gather the data necessary to evaluate Seer?
Should you use event-driven monitors or sampling monitors in the following situations? Why?
A. You need to determine the amount of real memory being used by various processes in an operating system.
A. You need to know how many packets are being handled by a high speed router in a real deployment.
C. In a testing environment, you need to know the queue length of unsatisfied requests at a web server, under both low and high load conditions.
You are testing the performance of an open source mail client program. You are especially interested in whether a new anti-spam component of the system that clearly performs some very serious analysis of each incoming mail message will slow down the performance of the mail client unacceptably. So the primary question you are interested in answering is how much load is imposed on the system by the arrival of a message and its processing through the anti-spam component. You've gathered a collection of real emails, both legitimate and spam, to use as a workload. You will feed these messages to the mail client at varying rates and with varying mixes of legitimate and spam messages. You have a dedicated Linux machine to use for your testing, and complete liberty to alter the machine's configuration in any way you want. How should you instrument this experiment? Remember, an important issue in this and any other experiment is the amount of time and effort required to perform it.