CS 561

Data Systems Architectures


Class at a glance

Class: Mon/Wed 5:00-6:15pm (HAR 306)
Instructors: Tarikul Islam Papon 
Zichen Zhu 

Lab: Fri 5:00-5:50pm (HAR 306)


OH: Mon 9 - 10 AM (@CCDS 925)
OH: Thu 10 - 11 AM (@CCDS 925)

Discussion on Piazza | Grades on Gradescope

Announcements

  • Midway Report Due date: March 22nd March 29th
  • Project 1 is released. Due date: Feb 16.
  • Project 0 is released. Due date: Feb 2.
  • Register for presentation by Feb 2.
  • No lab on Jan 19! First class on Jan 22.
  • Semester starts on Jan 18 - stay tuned for updates.


Class Milestones - Important Dates

Keep in mind the Official Semester Dates.

  • February 22, last day to drop (without a "W")
  • February 23, submit your project proposal
  • March 29, last day to drop (with a "W")


Class Schedule (tentative)

Here you can find the tentative schedule of the class (which might change as the semester progresses).

Class 1: Introduction to Data Systems and CS561

In this class we will discuss the basics of data systems and the goals and structure of the course.

Readings

Class 2: Data Systems Architectures Essentials – Part 1

In this class we discuss the fundamental components that comprise a database system. We will see the commonalities and the differences of the main database system architectures and we will discuss why we have several different ones.

Readings

Class 3: Data Systems Architectures Essentials – Part 2

In this class we continue discussing data systems architectures and the basics for modern systems focusing on relational row-stores and column-stores.

Readings

Class 4: Class Project Overview

In this class the students will be introduced to the class semester project. In that process we describe in detail LSM-trees and we highlight open research problems in data management.

Readings

A: Storage Layouts

Class 5: Log-Structured Merge (LSM) Trees

Readings

Class 6: Row-Stores vs. Column-Stores (student presentation S1)

Concepts: column-stores, row-stores, vertical partitioning, index-only plans, materialized views, tuple reconstruction, late/early materialization, block iteration, vectorized execution (block iteration), compression (run length encoding), hash joins, index joins, sort-merge joins, invisible joins, star schema

Readings

Class 7: Compaction in LSM Trees

Readings

Class 8: HTAP Systems (student presentation S2)

Concepts: key-value stores, point queries, blind updates, read-modify-write, on-line transaction processing (OLTP), on-line analytical processing (OLAP), locality, immutable file, mutable file, append-only systems, in-place updates

Readings

B. Indexing

Class 9: Introduction to Indexing, Trees & Tries

In this class the instructor will provide the necessary background to indexing. We will describe the most common design principles and decisions of index structures and provide the background needed for diving into the details of cutting-edge indexing papers.

Readings

Class 10: Guest Lecture on Database Tuning: Andy Huynh

Readings

Class 11: Adaptive Radix Trees (student presentation S3)

Concepts: tree indexing, tries, radix, adaptive radix trees

Readings

Class 12: Guest Lecture on Sortedness-Aware Indexing: Aneesh Raman

Readings

Class 13: Adaptive Indexing & Cracking (student presentation S4)

Concepts: adaptive indexing, cracking, stochastic cracking, hybrid cracking, scan, sort and binary search, adaptive adaptive indexing, radix partitioning, TLB, software managed buffers, non-temporal streaming stores, partitioning fanout, skew, adaptive indexing convergence rate, simulated annealing, uniform/normal/zipfian distribution

Readings

Class 14: Guest Lecture on Table Discovery and Integration in Data Lakes: Aamod Khatiwada

Readings

C. Modern Hardware

Class 15: Modern hardware trends

In this class the instructor will discuss modern hardware trends that drive system and index design with respect to storage, memories, and processing.

Readings

Class 16: Data Processing with GPUs (student presentation S5)

Concepts: GPUs

Readings

Class 17: Guest Lecture on Relational Memory: JuHyoung Mun

Class 18: SSD-Aware Data Systems

Readings

D. Query Evaluation

Class 19: Join Optimization

Concepts: query processing, join optimization, instance-optimal algorithms

Readings

Class 20: BMI-based Query Optimization (student presentation S6)

Concepts: query processing, query evaluation, bit manipulation instructionson, predicate pushdown

Readings

Class 21: Guest Lecture on Delete-Aware LSMs: Subhadeep Sarkar

E. ML For Data Systems

Class 22: Learned Query Evaluation (student presentation S7)

Readings

Class 23: Learned Indexes (student presentation S8)

Readings

Class 24: Learning Data Layouts (student presentation S9)

Readings

Project Presentation

Class 25: Project Presentations A

Project Presentation - I

Class 26: Project Presentationa B

Project Presentation - II

Project Awards (by popular vote)

Awards

  • Most Engaging Presentation: “Benchmark Compression With Near Sortedness” by Harshitha Tumkur Kailasa Murthy, Vishwas Bhaktavatsala
  • Project with Highest Technical Depth: “Query-Driven Compaction in LSM-Trees” by Karatsenidis Konstantinos, Shubham Kaushik, Nishil Agrawal
  • Best Overall Project: “Range Deletes in LSM-Trees” by Jingyi Li, Ming-Han Hsieh, Yu-Cheng Huang
  • Honorable Mention: “Exploring the Performance of Data Compression Algorithms with Varying Data Sortedness” by Shivangi and Vani Singhal