Background
The goal of this assignment is to learn how to evaluate the performance of network file systems.
Motivation
Network file systems are designed for a wide range of use-cases. Video streaming websites are one such use-case. They require the ability to efficiently stream data to and from many concurrent users. This requires storage of video files and metadata. It is common to store metadata in databases instead of on a network file system, however we would like you to consider the feasibility of storing metadata on the file system.
Task
Compare the performance of NFS version 4 and Glusterfs verson 3.2 for a video streaming website. You should consider what kind of traffic a video streaming websites requires. At minimum, you need to consider the implications of:
- Many concurrent users
- (Concurrent) Upload and download of video and metadata
- Movies of any quality and length
You can use any tool you like. Some suggestions (installed on test machines):
- dd (for simple tests)
- iozone
- bonnie++
You may have to create your own tools to generate streaming workloads.
Assignment
Solve the task in a group of two (or alone). Present your experiences and results orally in the course INF507x on October 14, 8:15. It is mandatory to write a report that in the specified format and deliver it by October 13, 8:15, using Devilry. Keep in mind that (a) it is possible to update the report until November, and (b) you will be asked to choose 4 out of 5 reports for evaluation.
It is mandatory to present your group's results on October 14. You do not have to prepare a formal presentation (like a Powerpoint foilset); however, you must at least show the measurement results that are included in your report and that you discuss in class. The discussions in class are supposed to help you improve your report for final delivery. It is recommended that you have a web page or a PDF document that is web-accessible from an arbitrary computer.
Machines
You will have access to 3 sets of machines. Each set consists of a client (10.0.0.1), and two file-servers (10.0.0.2 and 10.0.0.3):
- 10.0.0.2: Exports an NFS. This is mounted at /data/nfs/export/ on the client.
- 10.0.0.2 and 10.0.0.3: Exports:
- Glusterfs distributed between the two servers. This is mounted at /data/test-distributed/ on the client.
- Glusterfs replicated between the two servers. This is mounted at /data/test-striped/ on the client.
You do not have direct access to the servers, only to the clients. Log into oslo.ndlab.net (using ssh), and log into one of the clients:
- inf5072-14.ndlab.net
- inf5072-15.ndlab.net
- inf5072-21.ndlab.net
Reformat and mount the shares using:
$ sudo /opt/local/bin/reset-disks.py
Answer yes to any questions asked by the script. The script makes /data/local available on the local disk, in addition to the network shares described above. You should not use your home directory in your benchmark, since the home directory is an NFS share.
Warning!
The network adapters on the test machines are 2x1gbps. These are both mounted on a PCI bus. The PCI bus can not handle this amount of traffic, so your test results will be off. We are fixing this very soon. In the meantime, please do not let this stop you from developing the tools you require for the assignment. The workaround for this issue is that we have limited the transfer late of the network adapters to 100Mbps.
Booking
Use http://booking.ndlab.net to book a single client only for the time you require to run your tests. The benchmarking tools should be developed on your own machine if possible.
Report
The written report has up to 4 pages in ACM format (see right column). It is expect that such a report includes: a description of the assignment, a description of the testbed, an explanation of the metrics that were chosen to present the measurement results visually, graphs showing the results, an interpretation of the graphs.
The results must be based on the own tests.
The report is evaluated by writing quality, by the trustworthiness and correctness of the results. The evaluation does not consider whether related work (citations of other papers) is included. It is not necessary to cite existing work in this report.
Log in to comment