CS 561

Data Systems Architectures


Class at a glance

Class: Mon/Wed 5:00-6:15pm (HAR 306)
Instructors: Tarikul Islam Papon 
Zichen Zhu 

Lab: Fri 5:00-5:50pm (HAR 306)


OH: Mon 9 - 10 AM (@CCDS 925)
OH: Thu 10 - 11 AM (@CCDS 925)

Discussion on Piazza | Grades on Gradescope

Announcements

  • Midway Report Due date: March 22nd March 29th
  • Project 1 is released. Due date: Feb 16.
  • Project 0 is released. Due date: Feb 2.
  • Register for presentation by Feb 2.
  • No lab on Jan 19! First class on Jan 22.
  • Semester starts on Jan 18 - stay tuned for updates.


Class Milestones - Important Dates

Keep in mind the Official Semester Dates.

  • February 22, last day to drop (without a "W")
  • February 23, submit your project proposal
  • March 29, last day to drop (with a "W")


Class Schedule (tentative)

Here you can find the tentative schedule of the class (which might change as the semester progresses).

Class : Introduction to Data Systems and CS561

In this class we will discuss the basics of data systems and the goals and structure of the course.

Readings

Class : Data Systems Architectures Essentials – Part 1

In this class we discuss the fundamental components that comprise a database system. We will see the commonalities and the differences of the main database system architectures and we will discuss why we have several different ones.

Readings

Class : Data Systems Architectures Essentials – Part 2

In this class we continue discussing data systems architectures and the basics for modern systems focusing on relational row-stores and column-stores.

Readings

Class : Class Project Overview

In this class the students will be introduced to the class semester project. In that process we describe in detail LSM-trees and we highlight open research problems in data management.

Readings

A: Storage Layouts

Class : Log-Structured Merge (LSM) Trees

Readings

Class : Row-Stores vs. Column-Stores (student presentation )

Concepts: column-stores, row-stores, vertical partitioning, index-only plans, materialized views, tuple reconstruction, late/early materialization, block iteration, vectorized execution (block iteration), compression (run length encoding), hash joins, index joins, sort-merge joins, invisible joins, star schema

Readings

Class : Compaction in LSM Trees

Readings

Class : HTAP Systems (student presentation )

Concepts: key-value stores, point queries, blind updates, read-modify-write, on-line transaction processing (OLTP), on-line analytical processing (OLAP), locality, immutable file, mutable file, append-only systems, in-place updates

Readings

B. Indexing

Class : Introduction to Indexing, Trees & Tries

In this class the instructor will provide the necessary background to indexing. We will describe the most common design principles and decisions of index structures and provide the background needed for diving into the details of cutting-edge indexing papers.

Readings

Class : Guest Lecture on Database Tuning: Andy Huynh

Readings

Class : Adaptive Radix Trees (student presentation )

Concepts: tree indexing, tries, radix, adaptive radix trees

Readings

Class : Guest Lecture on Sortedness-Aware Indexing: Aneesh Raman

Readings

Class : Adaptive Indexing & Cracking (student presentation )

Concepts: adaptive indexing, cracking, stochastic cracking, hybrid cracking, scan, sort and binary search, adaptive adaptive indexing, radix partitioning, TLB, software managed buffers, non-temporal streaming stores, partitioning fanout, skew, adaptive indexing convergence rate, simulated annealing, uniform/normal/zipfian distribution

Readings

Class : Guest Lecture on Table Discovery and Integration in Data Lakes: Aamod Khatiwada

Readings

C. Modern Hardware

Class : Modern hardware trends

In this class the instructor will discuss modern hardware trends that drive system and index design with respect to storage, memories, and processing.

Readings

Class : Data Processing with GPUs (student presentation )

Concepts: GPUs

Readings

Class : Guest Lecture on Relational Memory: JuHyoung Mun

Class : SSD-Aware Data Systems

Readings

D. Query Evaluation

Class : Join Optimization

Concepts: query processing, join optimization, instance-optimal algorithms

Readings

Class : BMI-based Query Optimization (student presentation )

Concepts: query processing, query evaluation, bit manipulation instructionson, predicate pushdown

Readings

Class : Guest Lecture on Delete-Aware LSMs: Subhadeep Sarkar

E. ML For Data Systems

Class : Learned Query Evaluation (student presentation )

Readings

Class : Learned Indexes (student presentation )

Readings

Class : Learning Data Layouts (student presentation )

Readings

Project Presentation

Class : Project Presentations A

Project Presentation - I

Class : Project Presentationa B

Project Presentation - II

Project Awards (by popular vote)

Awards

  • Most Engaging Presentation: “Benchmark Compression With Near Sortedness” by Harshitha Tumkur Kailasa Murthy, Vishwas Bhaktavatsala
  • Project with Highest Technical Depth: “Query-Driven Compaction in LSM-Trees” by Karatsenidis Konstantinos, Shubham Kaushik, Nishil Agrawal
  • Best Overall Project: “Range Deletes in LSM-Trees” by Jingyi Li, Ming-Han Hsieh, Yu-Cheng Huang
  • Honorable Mention: “Exploring the Performance of Data Compression Algorithms with Varying Data Sortedness” by Shivangi and Vani Singhal