Seminar on Distributed Information Systems


Summer Semester 2010



Dr.-Ing. Sebastian Michel
Dr.-Ing. Martin Theobald

Link to the HISLF page of the seminar

First meeting: April 14th (Wednesday) 2010, 16:15, seminar room (Raum 001) in building E1 7


You have to attend this first meeting to register for the seminar and also to specify perferences on the available topics/papers. If there are more interested participants than available slots, we have to draw a suitable number of students and assign the papers.

The list of available talks can be found in the schedule, see below.




The first regular meeting with the first talk will take place on May 5th 2010.

Place/time of the Seminar

  • Place: Geb. E1 7 Raum 001 (the new Cluster of Excellence building)
  • Time: Wednesday 16:15, see the schedule for the exact days

Contents of the Seminar

Recent topics on distributed query/data processing in databases and information systems.

Requirements for the Certificate

  • Attend all talks - not just your own. We will keep track of participation! If you are sick, please let us know in advance by writing a short mail.
  • Read your papers and other related literature.
  • Contact your tutor at least 3 weeks before your talk and present a brief draft of your intended talk.
  • Prepare a 45 minutes talk about your topic that introduces the matter to your fellow students. This is about twice the size of a conference talk, so there should be enough time to present some background information on the topic. Try to pick the most interesting, challenging or futuristic contribution(s) from the paper. You are very welcome to discuss any potential weaknesses or problems of the paper(s) in your talk. If you are unsure about what to present, ask your tutor. Note that, even though the conference slides of some papers are available on the Web, we expect that you prepare your own slides (which may be, of course, inspired by the original slides).
  • You must send your slides to and discuss them with your tutor by the Friday before your talk (4pm) at the latest, otherwise your talk will be cancelled (this is a hard deadline).
  • Both the slides and the presentation itself must be given in English. Otherwise, some students will not be able to follow all talks, which is one of the main purposes of the seminar. After the presentations, there will be a discussion in which all fellow students are encouraged to ask questions. We will keep track of your participation (i.e., if you ask questions) and, of course, the answers of the presenter.
  • For each talk, a second student will be preselected as an opponent. His or her role is to prepare tough questions to challenge the paper presented in the talk (not the talk itself or the speaker!). To make life a little easier, the preliminary version of the slides will be sent to the opponent on the Friday before the talk. However, as interaction is an important part of science, we expect that every participant actively participates in the discussions.
  • Two weeks after the talk, the presenter and the opponent together have to submit a short (usually not longer than 5 pages) summary of the topic of the talk. The focus of this report should be on pointing out strengths and weaknesses of the approach presented in the paper(s), not just summarizing the paper(s).
  • After your talk, there will be another meeting with your tutor and Martin and/or Sebastian to give feedback on the talk and the report.
  • In other words: Your final grade will be influenced by the following components: Your oral presentation, the knowledge about your topic (your answers to questions after the presentation), the questions you asked as opponent, your general participation in the seminar, and your two written reports (one in the role of presenter, one in the role of opponent).

Schedule

(minor changes in the order possible)

We checked that any linked papers are available from the MPI-INF network. If you encounter any problems accessing a paper, please contact us.
  • 05.05.2010: Jeffrey Dean and Sanjay Ghemawat: MapReduce: Simplified Data Processing on Large Clusters OSDI'04: Sixth Symposium on Operating System Design and Implementation, 2004. paper
    • Speaker: Christoph Pinkel
    • Opponent: Isha Khosla
    • Tutor: Martin Theobald
    • Slides: pptx   pdf

  • 12.05.2010: Rares Vernica, Michael Carey, Chen Li: Efficient Parallel Set-Similarity Joins Using MapReduce SIGMOD 2010 paper
    • Speaker: Razvan Belet
    • Opponent: Javeria Iqbal
    • Tutor: Sebastian Michel
    • Slides: ppt

  • 19.05.2010: ParaTimer: A Progress Indicator for MapReduce DAGs Kristi Morton, Magdalena Balazinska, Dan Grossman SIGMOD 2010 paper
    • Speaker: Isha Khosla
    • Opponent: Huijing Deng
    • Tutor: Martin Theobald
    • Slides: ppt

  • 26.05.2010: Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel J. Abadi, Alexander Rasin, Avi Silberschatz: HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. PVLDB 2(1): 922-933 (2009) paper
    • Speaker: Frederic Raber
    • Opponent: Anjo Vahldiek
    • Tutor: Martin Theobald
    • Slides: pdf

  • 02.06.2010: Alan Gates, Olga Natkovich, Shubham Chopra, Pradeep Kamath, Shravan Narayanam, Christopher Olston, Benjamin Reed, Santhosh Srinivasan, Utkarsh Srivastava: Building a HighLevel Dataflow System on top of MapReduce: The Pig Experience. 1414-1425 paper
    • Speaker: Javeria Iqbal
    • Opponent: Andrey Tikhomirov
    • Tutor: Martin Theobald
    • Slides: ppt

  • 09.06.2010: Romulo Goncalves, Martin Kersten: The Data Cyclotron query processing scheme. EDBT 2010 paper
    • Speaker: Huijing Deng
    • Opponent: Luis Galarraga Del Prado
    • Tutor: Sebastian Michel
    • Slides: pdf

  • NO MEETING! 16.06.2010:

  • 23.06.2010: Nikos Ntarmos, Peter Triantafillou, Gerhard Weikum: Statistical structures for Internet-scale data management. VLDB J. 18(6): 1279-1312, (2009) paper
    • Speaker: Fateme Shirazi
    • Opponent: Razvan Belet
    • Tutor: Sebastian Michel
    • Slides: ppt

  • 30.06.2010: Ymir Vigfusson, Adam Silberstein, Brian F. Cooper, Rodrigo Fonseca: Adaptively Parallelizing Distributed Range Queries. 682-693 VLDB 2009 paper
    • Speaker: Anjo Vahldiek
    • Opponent: Christoph Pinkel
    • Tutor: Martin Theobald
    • Slides: pdf

  • 07.07.2010: Jian Li, Amol Deshpande, Samir Khuller: Minimizing Communication Cost in Distributed Multi-query Processing. ICDE 2009 paper
    • Speaker: Luis Galarraga Del Prado
    • Opponent: Fateme Shirazi
    • Tutor: Martin Theobald
    • Slides:

  • 14.07.2010: Zhenjie Zhang, Reynold Cheng, Dimitris Papadias, Anthony K. H. Tung: Minimizing the communication cost for continuous skyline maintenance. SIGMOD 2009 paper
    • Speaker: Andrey Tikhomirov
    • Opponent: Frederic Raber
    • Tutor: Sebastian Michel
    • Slides: