CS 561

Data Systems Architectures


Class at a glance

Class: Tue/The 11:00am-12:15pm (CAS 228)
Instructor: Manos Athanassoulis 

Lab: Fri 2:30-3:20pm (CDS 701)
Teaching Fellow: Teona Bagashvili 

Office: CDS 928
Office Hours: Posted on Piazza

Discussion on Piazza / Grades on Gradescope
TF Office Hours: Posted on Piazza

Announcements

  • Semester starts on - stay tuned for updates.
  • See updates and announcements in Piazza.


Class Milestones - Important Dates

Keep in mind the Official Semester Dates.

  • Feb 1, submit project 0
  • Feb 15, submit project 1
  • Feb 22, submit your project proposal
  • February 24, last day to drop (without a "W")
  • - , meet with your assigned mentor (graded)
  • Mar 22, submit your mid-semester progress report
  • - , meet with your assigned mentor (graded)
  • Mar 31, Student-led discussion
  • Apr 2, Student-led discussion
  • April 3, last day to drop (with a "W")
  • Apr 7, Student-led discussion
  • Apr 9, Student-led discussion
  • May 3, final submission of project code and report


Class Schedule (tentative)

Here you can find the tentative schedule of the class (which might change as the semester progresses).

Class : Introduction to Data Systems and CS561

Readings

Class : Data Systems Architectures Essentials – Part 1

Readings

Class : Data Systems Architectures Essentials – Part 2

Readings

Class : LSM intro and Class Project Overview

Readings

A: Storage Layouts

Class : Row-Stores vs. Column-Stores

Readings

Class : Guest Lecture on SSD Design Elements: Teona Bagashvili

Readings

Class : Log-Structured Merge (LSM) Trees & Compaction

Readings

Class : Deletes on LSM Trees

Readings

Class : Scans in Key-Value Stores

Readings

B. Indexing

Class : Cancelled due to snow day.

Class : Various forms of Indexing

Readings

Class : Sortedness-Aware Indexing

Readings

Class : Adaptive Radix Trees

Readings

Class : Bitmap indexing

Readings

C. Modern Hardware

Class : Modern hardware trends

Readings

Class : ACE Bufferpool

Readings

Class : Guest Lecture on "From Filters to Hash Tables: Rethinking Core Data Structures for Scalable Performance" (Prashant Pandey)

Abstract: Our ability to generate, acquire, and store data has grown exponentially over the past decade making the scalability of data systems a major challenge. In this talk, I will present my work on addressing this challenge through novel data structures and algorithms. First, I will introduce Monotonic Adaptive Filters, which address long-standing limitations in traditional filters by dynamically adapting to false positives while guaranteeing a maximum false positive rate, regardless of the query distribution. Next, I will discuss our advancements in modern hash tables, including IcebergHT and ZombieHT, which break traditional trade-offs by providing high performance with strong worst-case latency guarantees.

Bio: Pandey is an assistant professor in the Khoury College of Computer Sciences at Northeastern University. He focuses on creating scalable data systems with robust theoretical foundations. His work spans the entire spectrum of this challenge, from exploring the theoretical aspects of data structures to addressing the practical issues of scaling data systems. His work extends to tackling scalability challenges across computational biology, cybersecurity, stream processing, and storage systems. Pandey has received the NSF CAREER Award, NSF Elements Award, and the IEEE-CS Early Career Researchers Award for Excellence in High Performance Computing. Prior to joining Khoury College, he spent a year as a research scientist at VMware Research and held postdoctoral research positions at UC Berkeley and Carnegie Mellon University.

Readings

D. Student Talks

Class : Rethinking The Compaction Policies in LSM-trees
(Student Discussion SD1)

Readings

Class : How to Grow an LSM-tree? Towards Bridging the Gap Between Theory and Practice
(Student Discussion SD2)

Readings

Class : Logical and Physical Optimizations for SQL Qery Execution over Large Language Models
(Student Discussion SD3)

Readings

Class : Optimizing LLM Queries in Relational Data Analytics Workloads
(Student Discussion SD4)

Readings

E. ML For Data Systems

Class : ML for Systems and Learned Query Evaluation

Readings

Class : Learned Indexes

Readings

Class : Guest Lecture on "Space Efficient Secondary Learned Indexes" (Anwesha Saha)

Readings

Class : Exam

Click here for the exam guide

Project Presentation

Class : Project Presentations A

Project Presentation - I

Class : Project Presentations B

Project Presentation - II

Project Awards (by popular vote)

Awards