Conducting research in the areas of data engineering, data management, and machine learning systems requires the ability to deal with scientific literature in these areas as well as to design, implement, and evaluate prototypes. To facilitate these skills, the DAMS Lab group (FG Big Data Engineering) at TU Berlin offers a seminar and a programming project on Large-scale Data Engineering as a combined module (12 ECTS), which can be taken by bachelor and master students. Taking both seminar and project is the ideal preparation for a bachelor/master thesis with our group. Alternatively, only bachelor students may take the seminar as a separate module (3 ECTS) and the project as a separate module (9 ECTS).
Modules and assigned degree programsIn the beginning of the semester, students will hear introductory lectures on reading scientific papers, finding related work, writing high-quality scientific papers, and giving a high-quality scientific presentation. Each student selects a topic, reads and understands the given paper, searches for related work, and writes a short summary of the assigned paper (6 pages). In the end of the semester, each student gives a slide presentation (15 min talk + 5 min discussion) in front of the group.
In this semester, we focus on the umbrella topic of Extensible Data Systems:
To meet the requirements of emerging applications as well as to enable the timely adoption of novel techniques and technologies, making database and machine learning systems extensible has been an active research field for decades. Concepts for extensibility and variability have been proposed at all levels of the system stack from query/program languages, over the optimizer, down to the execution in distributed environments and on heterogeneous hardware and storage. This seminar takes a tour through some of the most important works in this field.
List of topics: Separate list of topics (updated Oct 20)
Submission & deadlinesIn the beginning of the semester, students/teams pick a programming project from a provided list, devise an initial design and then implement a prototype including documentation, tests, and relevant experiments. The project ends with a presentation (15 min) of the obtained results.
The topics of the project are independent of the seminar. We will offer tasks in a wide range of components of data management and machine learning systems. Each individual project will be conducted in the context of one of the two systems developed by our group (and other collaborators) as part of our research:
Thereby, students get the chance to make meaningful contributions to free open-source projects.
The projects can be done either individually or in teams of up to three students (with the expected amount of work proportional to the team size).
List of topics: Separate list of topics (updated Oct 20)
Submission & deadlines