Large-scale Data Engineering (Summer 2026)


Conducting research in the areas of data engineering, data management, and machine learning systems requires the ability to deal with scientific literature in these areas as well as to design, implement, and evaluate prototypes. To facilitate these skills, the DAMS Lab group (FG Big Data Engineering) at TU Berlin offers a seminar and a project on Large-scale Data Engineering as a combined module (12 ECTS), which can be taken by bachelor and master students. Taking both seminar and project is the ideal preparation for a bachelor/master thesis with our group. Alternatively, only bachelor students may take the seminar as a separate module (3 ECTS) and the project as a separate module (9 ECTS).

Modules and assigned degree programs
  • Large-scale Data Engineering (module #41086, 12 ECTS): seminar and project
    • Master's and bachelor's programs: M.Sc. Computer Science (Informatik), M.Sc. Computer Engineering, M.Sc. Information Systems Management (Wirtschaftsinformatik), M.Sc. Electrical Engineering (Elektrotechnik), B.Sc. Computer Science (Informatik), B.Sc. Computer Engineering (Technische Informatik), B.Sc. Information Systems Management (Wirtschaftsinformatik)
    • Registration via a poll in the ISIS course. Deadline: Apr 01, 23:59.
    • Notification of admission (admitted or waiting list): Apr 02.
  • Seminar Large-scale Data Engineering (module #41095, 3 ECTS): seminar only
    • Bachelor's programs: B.Sc. Computer Science (Informatik), B.Sc. Computer Engineering (Technische Informatik), B.Sc. Information Systems Management (Wirtschaftsinformatik), B.Sc. Media Technology (Medientechnik)
    • Registration organized centrally by Faculty IV via an ISIS meta course. Deadline: tba
    • Notification of admission: centrally by Faculty IV on tba.
  • Project Large-scale Data Engineering (module #41183, 9 ECTS): project only
    • Bachelor's programs: B.Sc. Computer Science (Informatik), B.Sc. Computer Engineering (Technische Informatik), B.Sc. Information Systems Management (Wirtschaftsinformatik)
    • Registration via a poll in the ISIS course. Deadline: Apr 01, 23:59.
    • Notification of admission (admitted or waiting list): Apr 02.
    • This module cannot be taken as a programming practical (Programmierpraktikum) anymore. Suggested alternative: Programmierpraktikum Datensysteme.

News

See announcements in the ISIS course.

Seminar

Time: Mondays 14:00 – 16:00
Place: MAR 0.008 and zoom

At the beginning of the semester, students hear introductory lectures on reading scientific papers, finding related work, writing high-quality scientific papers, and giving a high-quality scientific presentation. Each student is assigned an initial paper to read and understand. After that, students search for related work and write a short summary of the assigned paper, including an overview of related work. At the end of the semester, each student gives a slide presentation in front of the group, followed by a discussion of the topic.


Topics

This semester's umbrella topic: Robust and Adaptive Query Processing

Traditionally, database query processing is divided into an optimization phase, which determines an optimal plan for the query, and an execution phase, which executes this plan. During optimization, different logically equivalent plans are enumerated and the plan with the lowest cost with respect to some cost model is chosen. Cost estimation is largely based on estimates of the cardinalities of intermediate results. Unfortunately, these estimates are often quite wrong resulting in bad query execution plans that may take orders of magnitude longer to execute than the optimal plan. Moreover, additional unknowns further complicate the efficient query processing, e.g., unknown properties of base data and input datasets, the access to external data sources, query parameters, and the system utilization at run-time. This semester, we deal with a broad range of research papers that address these challengens through (a) improved creation/management of statistics, (b) robust query optimization, and (c) adaptive query processing.

List of topics: (tba)

Submission & deadlines
  • Topic selection: After the first introductory lecture via a poll in the ISIS course.
    Deadline: tba.
    Notification of assigned topics: tba.
  • Submission of the summary paper (PDF) by upload in the ISIS course.
    Deadline: tba.
  • Submission of the presentation slides (PDF) by upload in the ISIS course.
    Deadline: The day before the presentation, 23:59.
    Last-minute changes after the submission are permitted.

Preliminary Schedule
Introductory lectures
Slides will be made available prior to the individual lectures.
  • Apr 20: 01 Structure of Scientific Papers [pdf, pptx]
  • Apr 27: 02 Scientific Reading and Writing [pdf, pptx]
  • May 04: 03 Experiments, Reproducibility, and Presentations [pdf, pptx]
Self-organized seminar work
  • tba – Jun 22, room FR 768 and zoom: Optional consultation hours to discuss any questions
Student presentations
  • Jun 29, 14:00 – 18:00: Final presentations #1
  • Jul 06, 14:00 – 18:00: Final presentations #2

Project

Time: Mondays 16:00 – 18:00
Place: MAR 0.008 and zoom

At the beginning of the semester, students pick a project from a provided list, devise an initial design and implement an initial prototype including tests and documentation. The students present the initial prototype in front of the course in the middle of the semester. Furthermore, they conduct extensive experiments to prove the quality and properties of their prototype. The results of these experiments guide the further development of a final prototype, which the students present in front of the course at the end of the semester. The projects are augmented by regular discussion rounds with a project mentor throughout the semester.


Topics

The topics of the project are independent of the seminar. We will offer tasks in a wide range of components of data management and machine learning systems. Each individual project will be conducted in the context of one of the two systems developed by our group (and other collaborators) as part of our research:

  • DAPHNE: An open and extensible system infrastructure for integrated data analysis pipelines (mainly written in C++)
  • Apache SystemDS: An open-source ML system for the end-to-end data science lifecycle (mainly written in Java)

Thereby, students get the chance to make meaningful contributions to free open-source projects. The projects can be done either individually or in teams of up to three students (with the expected amount of work proportional to the team size).

List of topics (tba)

Submission & deadlines
  • Topic selection: After the kick-off meeting via a poll in the ISIS course. Deadline: tba.
    Notification of assigned topics and teams: tba.
  • Submission of the initial prototype (source code, tests) as a pull request on the GitHub repository of either DAPHNE or SystemDS (or, exceptionally via email to patrick.damme(æ)tu-berlin.de and the respective project mentor).
    Deadline: tba.
  • Submission of the final prototype (source code, tests, docs, experiments) as a pull request on the GitHub repository of either DAPHNE or SystemDS (or, exceptionally via email to patrick.damme(æ)tu-berlin.de and the respective project mentor).
    Deadline: tba.
  • Submission of the presentation slides (PDF) by upload in the ISIS course.
    Deadline: The day before the presentation, 23:59.
    Last-minute changes after the submission are permitted.

Preliminary Schedule
Introductory lectures
Slides will be made available prior to the individual lectures.
  • Apr 20: Kick-off Meeting [pdf, pptx]
  • May 04, Recommendation to attend the seminar at 14:00
Self-organized project work
  • tba – Jul 27: Recommended consultations with the project mentor to discuss the initial design, implementation, tests, documentation, experiments, and any questions
Student presentations
  • Jun 22, 16:00 – 18:00: Intermediate presentations
  • Aug 03, 14:00 – 18:00: Final presentations

Organization

People
  • Dr.-Ing. Patrick Damme, patrick.damme(æ)tu-berlin.de
    Lecturer, seminar mentor, project mentor, general contact person
  • Prof. Dr.-Ing. Matthias Boehm; tba
    Project mentors
Language
  • Both seminar and project are given exclusively in English, but questions or other communication in German is fine as well.