We wish to design a model for a generator to create a workload for the Conquest file system. We have obtained a file system trace and below characterize it by the accesses to metadata, the accesses to small files, the sequential accesses to large files (defined as either accessing the first byte of a large file, or the byte immediately after the last byte of the file that was accessed), and the non-sequential accesses to large files (defined as any other pattern of access to a large file than that defined as sequential access). The table below shows how often each type of operation was performed and how often each other operation immediately followed it.
M = metadata (100,000 total operations) SF = small file (500,000 total operations) SLF = sequential access to large files (225,000 total operations) NLF = non-sequential access to large files (50,000 total operations)
Next operation operation M SF SLF NLF M 31,500 65,000 3,000 500 SF 68,000 419,000 10,000 2,000 SLF 400 5,000 208,000 11,600 NLF 99 10,000 4,000 35,900
a) Create a model for this access pattern based on the probability of each kind of operation occurring. (6 points)
b) Design a Markov model for this access pattern. Show both the chart and the state transition diagram. (10 points)
c) Is a Markov model a better model for this workload, or should we use the purely probabilistic model? Why? Is there any piece of information about the system that it would be helpful to know in making this decision? (10 points)
A. A laptop computer being considered for use by a group of salesmen.
B. A machine intended as the web server for a moderate sized (but rapidly growing) Internet commerce site.
C. A distributed system that detects the spread of worms by analyzing the number of connections from unique sites initiated per minute.
D. A router to handle traffic internally for a large company. The router is expected to be connected to 2-3 incoming lines at T3 speeds or greater, and will fan out traffic to around two dozen internal local area networks.
E. Determining which of several alternate implementations of TCP will provide the best service for a set of machines that move large scientific data sets across the Internet.
F. A browser plugin that provides users with a color-coded indication of the probability that a particular web site is being used for phishing.
G. A virus checking program designed to scan an entire disk.
H. A virus checking program that looks for viruses in email attachments.
I. A new cryptographic algorithm that purports to be faster than AES.
J. Deciding whether a company will achieve reasonable performance if it upgrades from Windows XP to Vista.
A. A multipath routing algorithm finds several paths from source to destination in a network by proportionally dividing traffic based the load on the paths, with a goal of reducing the average delay for a packet. Background traffic in the network is considered by another mechanism. How should we represent the workload for the source-to-destination traffic? (5 points)
B. A distributed hash table implementation of DNS is to be tested in a wide-area environment for both reliability and speed. What parameters should we choose and how should we characterize each? (5 points)
C. A new piece of buffer management code has been added to Linux, with the goal of reducing the average time to allocate buffers for both file system operations and network sends and receives. (5 points)
D. A new algorithm for deciding when to spin down the disk in a laptop computer has been proposed.
i). What parameters would you choose to test whether this algorithm saves more battery power than the existing algorithm? How would you characterize them? (5 points)
ii). What parameters would you choose to test whether this algorithm has a different effect on the performance of applications on the machine? How would you characterize them? (5 points)
iii). Describe why you should use the same or different parameters for these two cases. (5 points)
1). If you were evaluating the system for an article to appear in an industry magazine? (7 points)
2). If you were evaluating the system to be installed in your own office? (7 points)