Background
A recent industry trend for video streaming is the use of dynamic adaptive segment streaming over HTTP, or DASH. This has the great practical advantage that it works with standard (or nearly standard in the case of Microsoft Smooth Streaming) web servers like Apache or Lighttpd and web caches like Squid and traverses firewalls really well. Current solutions are called Microsoft Smooth Streaming, Apple HLS or Move Networks Player; the MPEG DASH standard is just around the corner. The approach works by chopping a video into so-called segments, short independent videos (2-4 seconds long for Microsoft, 10 seconds for Apple), coding these in several qualities, and letting the player decide how and when to fetch those segments and also choosing the quality that works for its current network.
Smooth Streaming was used to broadcast the 2010 Winter Olympics "live" over the Internet, and in this assignment you should check just how "live" a client should try to be when the number of clients is very high. We do not consider the idea of several qualities in this assignment. We don't use real video content, either.
Motivation
The motivation of this assignment is to learn how to evaluate the performance of the real-time media streaming scenario described above. What is the failure behaviour of the system when the number of client grows? How frequently do clients experience "hickups" (fail to get the next segment before the playback of the current segment is complete)? Does the scalability of the system change when you change the clients' download speed and/or delay before downloading a segment after it becomes available (without communication between the clients)?
Task
Set up a Lighttpd web server as described in a separate section below. Use the generate_streaming_segments.py described in a separate section below to stream data files (video segments) into a directory exposed through the web server. Evaluate how this scenario performs with different client download strategies.
The script on the server generates dummy data as "live video content". Every segment that is generated is supposed to be a video segment containing enough video form 10 seconds playback. A new one is generated every 10 seconds. Your clients must download them quickly enough for real-time "playback".
You need to create client(s) that can test the different strategies (various delays between noticing availability and starting to download, various download speeds, adding some randomness, ...). To get usable measurements you need to execute clients on several machines at the same time, and preferrably have multiple clients on each machine. Note that the clients have to make a request (poll) to the server to detect if new data is available.
Buffering and prefetching on the clients is allowed, but consider that your imaginary users want an experience quite close to "real-time" viewing. They are also watching the Olympics all day.
Machines
For this assignment you can use your own machines, or machines at IFI.
Machines at IFI are usually marked with their name. So you should be able to find machines by visiting bachelor and master labs. Bachelor machines can also be found using:
Find name of bachelor labs: $ ls /hom/peder/opt/termstue/share/maps/ Show machines on a lab: $ ~termvakt/bin/termstue <ROMNAVN> Example: $ ~termvakt/bin/termstue assembler
Note: The bachelor machines runs the Idle Job Killer script. This script kills any processes on machines where you are not logged in. One solution is to be logged in and do something on every machine, another is to set a nice value (this is explained in the emails Idle Job Killer sends you when it kills a process).
Note: Be nice to others at IFI if you can. If you use computers that are in the same lab, you don't put all that traffic onto IFI's backbone network.
Lighttpd on IFI machines
Download from the Lighttpd website. Install with:
$ mkdir ~/lighttpd/ $ ./configure --prefix ~/lighttpd/ $ make $ make install
Create a lighttpd configuration that listens on a port that does not require superuser rights, and that serves files from a directory that generate_streaming_segments.py (below) can write to. We reccommend that you download our example config and change the port and directory names to suite your needs. We reccommend that you use /tmp/ for data because that is supposed to be on an SSD disk on most or all of the IFI lab machines. Make sure you create the document-root and errorlog directories before starting the server.
Start Lighttpd with a custom config location in the foreground with the following command:
$ ~/lighttpd/sbin/lighttpd -f /path/to/lighttpd.conf -D
generate_streaming_segments.py
Download the script. Run it with:
$ python generate_streaming_segments.py DIRNAME
where DIRNAME is the directory configured as server.document-root in lighttpd.conf. The script generates a new 1.2 MB file every 10 second, and removes files more than one minute old. The filename is a timestamp expressed in seconds since the epoch, in UTC. Each file is suffixed with .supervid, which should make it easy to extract filenames from the directory listing returned by Lighttpd.
Hints
Modern programming languages have built-in libraries for most of the things you need for the client. Python has urllib2 and Java has the java.net pacage that you can find an example of in HttpDownloader.download() in this example.
libcurl and wget are also nice options.
Assignment
Solve the task in a group of two (or alone). Present your experiences and results orally in the course INF507x on November 25, 8:15. It is mandatory to write a report that in the specified format and deliver it by November 24, 8:15, using Devilry. Keep in mind that (a) it is possible to update the report until November, and (b) you will be asked to choose 4 out of 5 reports for evaluation.
It is mandatory to present your group's results on November 4. You do not have to prepare a formal presentation (like a Powerpoint foilset); however, you must at least show the measurement results that are included in your report and that you discuss in class. The discussions in class are supposed to help you improve your report for final delivery. It is recommended that you have a web page or a PDF document that is web-accessible from an arbitrary computer.
Report
The written report has up to 4 pages in ACM format (see right column). It is expect that such a report includes: a description of the assignment, a description of the testbed, an explanation of the metrics that were chosen to present the measurement results visually, graphs showing the results, an interpretation of the graphs.
The results must be based on the own tests.
The report is evaluated by writing quality, by the trustworthiness and correctness of the results. The evaluation does not consider whether related work (citations of other papers) is included. It is not necessary to cite existing work in this report.
"Keep in mind that (a) it is possible to update the report until November" When is this? Even if it is end of November, this does not allow many days. Could we have an exact date? Preferably in December ;)
"It is mandatory to present your group's results on November 4." This I assume is a simple typo.
Log in to comment