CS 561

Data Systems Architectures


Class at a glance

Class: Tu/Th 12:30-1:45pm (MCS B37)
Instructor: Subhadeep Sarkar 

Lab: Fri 9:05-9:55am (CAS 116)
Teaching Fellow: Zichen Zhu 

OH: Tue/Thu 2-3pm
Office: CCDS 939 (@665 Comm. Ave.)

Discussion on Piazza
TF OH: Mon/Wed 4-5pm (CCDS 925D)
Grades on Gradescope

Announcements

  • New Slides for your group presentation are now available.
  • More details about the projects are available in the project page. Proposal due: March 12th.
  • Project 1 is released. Due date: Feb 20.
  • Due date of Project 0 has been extended to Feb 5.
  • Register for presentation by Jan 31.
  • Project 0 is released. Due date: Feb 2.
  • The schedule is populated with the papers to discuss.
  • Updated syllabus is now online (subject to changes).
  • Class starts on Jan 19 - stay tuned for updates.


Class Milestones - Important Dates

Keep in mind the Official Semester Dates.

  • February 1, last day to add (any) class
  • February 23, last day to drop (without a "W")
  • March 31, last day to drop (with a "W")


Class Schedule (tentative)

Here you can find the tentative schedule of the class (which might change as the semester progresses).

Class : Introduction to Data Systems and CS561

In this class we will discuss the basics of data systems and the goals and structure of the course.

Readings

Class : Data Systems Architectures Essentials – Part 1

In this class we discuss the fundamental components that comprise a database system. We will see the commonalities and the differences of the main database system architectures and we will discuss why we have several different ones.

Readings

Class : Data Systems Architectures Essentials – Part 2

In this class we continue discussing data systems architectures and the basics for modern systems focusing on relational row-stores and column-stores.

Readings

Class : Class Project Overview

In this class the students will be introduced to the class semester project. In that process we describe in detail LSM-trees and we highlight open research problems in data management.

Readings

A: Storage Layouts

Class : Log-Structured Merge (LSM) Trees

Readings

Class : Deletes in LSMs

Readings

  • Slides
  • (P) Lethe: A Tunable Delete-Aware LSM Engine, SIGMOD 2020
  • Technical Question : To ensure bounded delete persistent latency, Lethe aggressively performs compactions. How does these eager compactions affect write amplification of the overall system?

B: Modern Storage Engines

Class : Row-Stores vs. Column-Stores

Concepts: column-stores, row-stores, vertical partitioning, index-only plans, materialized views, tuple reconstruction, late/early materialization, block iteration, vectorized execution (block iteration), compression (run length encoding), hash joins, index joins, sort-merge joins, invisible joins, star schema

Readings

Class : Adaptive & Hybrid Layouts (student presentation )

Concepts: on-line transaction processing (OLTP), on-line analytical processing (OLAP), n-ary storage model (NSM), decomposition storage model (DSM), partition attributes across (PAX), flexible storage model (FSM), projectivity, selectivity, concurrency control, multi-version concurrency control (MVCC), two-phase locking (2PL)

Readings

Class : Guest Lecture on Robust LSM Designs: Andy Huynh

Readings

Class : Guest Lecture on Bloom Filters in LSM-Trees: Zichen Zhu

Readings

C. Indexing

Class : Introduction to Indexing, Trees & Tries

In this class the instructor will provide the necessary background to indexing. We will describe the most common design principles and decisions of index structures and provide the background needed for diving into the details of cutting-edge indexing papers.

Readings

Class : Adaptive Radix Trees (student presentation )

Concepts: tree indexing, tries, radix, adaptive radix trees

Readings

Class : Adaptive Indexing & Cracking (student presentation )

Concepts: adaptive indexing, cracking, stochastic cracking, hybrid cracking, scan, sort and binary search, adaptive adaptive indexing, radix partitioning, TLB, software managed buffers, non-temporal streaming stores, partitioning fanout, skew, adaptive indexing convergence rate, simulated annealing, uniform/normal/zipfian distribution

Readings

Class : Guest Lecture on Sortedness-Aware Indexing: Aneesh Raman

Readings

D. Modern Hardware

Class : Modern hardware trends

In this class the instructor will discuss modern hardware trends that drive system and index design with respect to storage, memories, and processing.

Readings

Class : Guest Lecture on Parametric I/O Model: Tarikul Islam Papon

Readings

Class : HTAP Systems (student presentation )

Concepts: key-value stores, point queries, blind updates, read-modify-write, locality, immutable file, mutable file, append-only systems, in-place updates

Readings

Class : SSD-Conscious Designs (student presentation )

Readings

Class : Data Processing with GPUs (student presentation )

Concepts: GPUs

Readings

Class : Guest Lecture on Cosine: Subarna Chatterjee

Readings

E. Serverless Computing

Class : The Serverless Paradigm (student presentation )

Concepts: cloud-based data management, serverless

Readings

Class : The Execution Process In Serverless Computing (student presentation )

Readings

F. ML For Data Systems

Class : Guest Lecture on Learned Index: Ryan Marcus

Readings

Class : Guest Lecture on Relational Memory: JuHyoung Mun

Class : Guest Lecture on Systems for ML: Sanket Purandare

Project Presentation

Class : Project Presentations A

Project Presentation

Class : Project Presentationa B

Project Presentation - II

Project Awards (by popular vote)

Awards

  • Most Engaging Presentation: “Benchmark Compression With Near Sortedness” by Harshitha Tumkur Kailasa Murthy, Vishwas Bhaktavatsala
  • Project with Highest Technical Depth: “Query-Driven Compaction in LSM-Trees” by Karatsenidis Konstantinos, Shubham Kaushik, Nishil Agrawal
  • Best Overall Project: “Range Deletes in LSM-Trees” by Jingyi Li, Ming-Han Hsieh, Yu-Cheng Huang
  • Honorable Mention: “Exploring the Performance of Data Compression Algorithms with Varying Data Sortedness” by Shivangi and Vani Singhal