Assignments for INF5071 / Autumn 2007

Taken

Compare state machine generators

By: Magdalena Strzalkowska

Many of today's protocols rely on states that can be expressed easily as finite state machines (among them the signalling protocol for IP telephony, SIP). For many protocols, finite state machines are even used for define in their standards how they. Unfortunately none of the mainstream languages have built-in state machine concepts to generate programs from a finite state machine description. People use switches or if-cascades, and they forget cases all the time. But there are tools that generate code from finite state machine description. You should compare several of these and find out how easy it is to use them, whether they support multithreading, whether they can be used with networked (socket) code, and how scalable the generated code is. Example generators: FSMGenerator (Sourceforge), CHSM (Sourceforge), Boost Statechart Library (boost.org), SMC (Sourceforge).

Report accepted.

Overhead evaluation of SELinux

By: Lars Strand

SELinux was originally developed by NSA and surprised everyone when they open sourced it. Today RedHat drives the devlopment and it is installed on all version of RHEL/Fedora (other distributions catching up). SELinux introduces a new security model in Linux, moving from traditional discretionary access control (DAC) to mandatory access control (MAC). Since a lots of users run SELinux without knowing it, it has been the cause of many confusing errors and general weird behavior. But even more important: Since SELinux uses Linux' LSM, it must be some performance overhead. This task should benchmark some of the common services (http, database, file system operation etc.) and measure the overhead.

Report accepted.

Compare traffic of online games

By: Martin Øinæs Myrseth, Torkild Retvedt

Games like Anarchy Online and Mankind have free test accounts. In this assignment we would like you to install them, log the network traffic from busy virtual locations inside the game, and compare the results. Which transport protocols are used? What bandwidth is consumed? Do you experience lag/delay and do you see that in the logs? We provide the Linux machines for tcpdump, you provide the machine that runs the game (unless you can run it in VMWare).

Report accepted.

Compare the overhead of virtual machines

By: Lars Snellingen Bye, Ismar Slomic, Øyvind Nordang

Virtual machines have various uses. They provide protection from dangerous applications, they can allocate a limited set of resources to a program, or run several operating systems at the same time. But this comes at a performance price. Install two VMs (e.g. Xen and VMWare) and compare their raw overhead when they have all computer resources available for themselves for one CPU-bound and one IO-bound benchmark. We provide VMWare for Linux, Xen is free.

Report accepted.

Real-time CPU schedulers

By: Alexander Ottesen, Bjørnar Snoksrud, Kim Bowles Sørhus

Two pre-emptive real-time schedulers are discussed in this lecture: earliest deadline first (EDF) and rate monotonic (RM). These are the classical ones used in all kinds of real-time systems. It's been a long time since they have been compared for their efficiency for streaming. Write these two schedulers in native Linux code, not for inclusion in the kernel, but to execute in Linux's existing real-time scheduling class SCHED_FIFO. Try to run heavy loads like video transcoding with mplayer with guaranteed timeslices in those schedulers and compare how they fare.

Report accepted.

Compare TCP with TCP-friendly rate control

By: Tofik Sahraoui

Network providers keep telling us that UDP isn't used because it is unfair to the main, friendly, cooperative network protocol, TCP. But applications can easily add TCP-friendliness to UDP applications, for example by using TFRC (TCP-friendly rate control). Find an implementation and compare how it fares in competition with various of today's TCP implementations when both try to consume their "fair" rate. Don't worry if you don't find an implementation. TFRC consists only of one formula for the allowed data rate and you can easily write that yourself.

Performance of rendering video in a 3D engine

By: Fredrik Gaarder

Identify at least 2 3D rendering engines that support display of video encoding by MPEG from within the virtual world and compare the performance. Performance should be measured in terms of frame rate, and/or CPU load.

Report accepted.

Compare Linux' alternative disk schedulers

By: Hans Vatne Hansen, Carl Henrik Lunde


Linux has several disk schedulers included in the 2.6 kernel, i.e., Linus elevator, Deadline I/O, Anticipatory I/O and complete fair scheduler (CFQ). In this assignment, the different schedulers should be evaluated where both request latency and throughput must be evaluated. This should be done using known benchmark programs like Bonnie and IOzone. Additionally, a single-process, multi-threaded benchmark should be implemented reading streams from disk (to /dev/null). Here, one should also simulate a time dependent stream and background load and look at deadline violations. Some information about the scheduler(s) can be found in /usr/src/Linux/Documentation/block/as-iosched.txt and /usr/src/linux/Documentation/block/deadline-iosched.txt

Report accepted.

Compare the performance of various streaming servers

By: Tonje Fredrikson

MPlayer can retrieve streaming video from various servers via HTTP and RTSP/RTP protocols. Among them are Helix, Darwin Streaming Server, VLC and Apache. Compare the negotiation overhead for stream initiation, and compare the effects of network problems on the various setups.

Report accepted.

Efficiency of distributed genetic sequence database searches

By: Espen Hannisdal

BLAST and PARALIGN are two software packages for performing sequence similarity searches in large DNA and protein sequence databases. To speed up searches, these programs can utilize multiple cores in SMP machines and/or multiple nodes in a computer cluster. In this project we will study the efficiency of these programs when running on a varying number of cores and/or nodes. In addition, we will study the effect of locating the databases on a local disk or accessing the databases through the network.

Report accepted.

Trace ppLive

By: Sverre Tenn?e

ppLive is a Chinese P2P network for video streaming. You can get access to hundreds of current TV shows, which makes it probably illegal in most places. It is hard to install because it's all in Chinese and it attracts with lots of virusses and other dangerous stuff. It runs only on Windows. If you want to install it, we recommend strongly to isolate it in a virtual machine that shares nothing with the host machine and that is deleted after the testing. But it would be highly interesting to see the traffic of a ppLive stream, captured by tcpdump and analysed with respect to the connections that are made. Dangerous! But interesting. We provide the machines, you handle installing and deleting Windows.

Report accepted.

The effect of redundant data bundling in TCP for thin streams

By: Kristian Evensen

Report accepted.

A Quantative Comparison of Thin Stream Packet Flows in the Transport Protocols TCP and SCTP

By: Paul B. Beskow

Report accepted.


Not taken

CPU schedulers

Schedulers in Linux have been replaced several times during the development of the 2.6 kernel series. Build a benchmark that allows you to compare them to each other. The workload of such a benchmark should stress the computer and include heavy background workloads as well as several interactive programs. For the interactive programs, a program like "expect" might be helpful.

Test OceanStore/Pond

Pond is a Java prototype for a P2P filesystem idea called OceanStore that is available from Sourceforge. It's supposed to be very crash-resistant and scalable. While we don't have the opportunity to test its scalability in the lab, or the opportunity to evaluate its speed because it's written in Java, you should install it on several nodes and evaluate its networking overhead as well as its crash resistance. How quickly does it react to node failure? How does it make sure that enough copies of every block of every files exists? Where are its limits? How much more network bandwidth does it require compared to, say, NFS?

For everyone

This assignment is meant to consume roughly 30 hours in total per person, including make a presentation. It has three milestones: delivery of the project plan 05. October, presentation of the results 2. November, and the oral exam where the assignment is a minor topic. The project plan is not meant to be a formal project plan as taught in software engineering lecture. That alone would consume the entire allocated time. Instead, we expect an informal description that includes the following:
Writing this requires that you have already downloaded and taken a look at the tools that you intend to use!