Class at a Glance
Class: Tue/Thu 2:00-3:15pm, HAR 208
Instructor: Manos Athanassoulis
Manos OH: 3:30pm-4:30pm (after class)
Manos OH Location: MCS 106
Labs: Tue 9:30-10:20am / 11:15am-12:05pm / 12:30-1:20pm, MCS B33 / MCS B29/ KCB 107
TFs: Aneesh Raman ,
Tarikul Islam Papon ,
Subhadeep Sarkar
TFs' OH Details: Check on Piazza
Discussions on: Piazza
/ Notes
/ Gradescope
Announcements
- Nov 10th: WA7 Posted. Due date: Dec 5
- Nov 19th: PA2.2 Posted. Due date: Dec 7
- Nov 10th: WA6 Posted. Due date: Nov 23
- Nov 11th: PA2.1 Posted. Due date: Nov 30
- Nov 10th: WA5.b Posted. Due date: Nov 17
- Oct 12th: PA1.3 Posted. Due date: Nov 05
- Oct 12th: WA5 Posted. Due date: Oct 18
- Oct 5th: WA4 Posted. Due date: Oct 11
- Sep 29th: PA1.2 Posted. Due date: Oct 14
- Sep 28th: WA3 Posted. Due date: Oct 4
- Sep 20th: WA2 Posted. Due date: Sep 27
- Sep 14th: PA1.1 Posted. Due date: Sep 28
- Sep 11th: WA1 Posted. Due date: Sep 19
- Aug 25th: Class starts on 9/2!
Class Schedule
Here you can find the tentative schedule of the class (which might change as the semester progresses). The textbook we use is also referred to as the "cow book" (see more details in the syllabus).
The slides will be uploaded here after each class, but they cannot be used as a replacement of the lectures, and will not be sufficient on their own. Some lectures might deviate from the textbook (in presentation, order, and content). Hence, attendance during the lecturing and during the class discussion is mandatory and an integral part of the class.
Class
IntroductionIn our first class we introduce the concept of database systems, which store data and offer a declarative interface to access the data. We introduce the basic building blocks of database systems that are used to offer the expressive and efficient declarative interface to the data, and we discuss the aspects of everyday life, business operation, and scientific discovery for which database systems play a crucial role.
- Slides
- Textbook, Chapter 1
Class
Database Systems ArchitecturesIn this class we discuss the fundamental components that comprise a database system. We will see the commonalities and the differences of the main database system architectures and we will discuss why we have several different ones. We will go over the key characteristics of relational systems (row-stores and column-stores), and we will introduce different designs like key-value stores and graph stores. Finally, we will introduce the class projects and we will discuss in detail course logistics.
- Slides
- "The Design and Implementation of Modern Column-Oriented Database Systems", Chapters 1, 2, 3
- "Architecture of a Database System",Chapter 1
Class
ER ModelIn this class we discuss the process of conceptually design our database. We will discuss how we can take specific requirements and transform them to a conceptual database schema using the Entity-Relationship Model (ER-Model). We will cover this process with examples.
- Slides
- Textbook, Chapter 2
Class
Relational ModelIn this class we introduce the Relational Model, the most widely used model by vendors, institutions, and organizations to represent and store data. We connect this discussion with ER Model and we show how to build the relational model of an application when starting from the ER Model. This serves as a first get-to-know to SQL focusing on the DDL commands.
- Slides
- Textbook, Chapter 3
Class
Relational AlgebraIn this class we introduce Relational Algebra, a query language used to express the implementation of queries. Relational Algebra is applied directly on relational data and can describe multiple ways of implementing the same "logical" query. We discuss the fundamental operations, their properties and the operations we can define using them (compound operations).
- Slides
- Textbook, Chapter 4.1, 4.2
Class
Functional DependenciesIn this class we introduce redundancy as one of the main problems in a relational schema. We introduce Functional Dependencies (FD) as generalized keys in order to help us identify a bad schema. We discuss how to reason for FD.
- Slides
- Textbook, Chapter 19.1-19.3
Class
Decomposition & Schema NormalizationIn this class we use functional dependencies to identify bad schemata and to propose how to decompose relations to avoid problems from redundancy. We further discuss how to get good decompositions, that is, having lossless-joins and being (functional) dependency preserving. We discuss several normal forms that we can achieve with a varying degree of "how much" we decompose.
- Slides
- Textbook, Chapter 19.4-19.7
Class
SQL IIn this class we first introduce the basic constructs of an SQL query and then we continue thoroughly over several examples for SQL queries, slowly building increasingly complex queries. We discuss the basic SQL query, union-compatible operations, nested queries, aggregate operators, and the GROUP BY and HAVING keywords.
- Slides
- Textbook, Chapter 5.1-5.5
Class
File Organization & Introduction to IndexingIn this class we introduce the main concepts needs to start working with the internals of database systems. We lay the groundwork needed for the memory hierarchy, file organization, page organization, and indexing.
- Slides
- Textbook, Chapter 8 & 9.1, 9.5-9.7
- "Data page layouts for relational databases on deep memory hierarchies", The VLDB Journal, 2002 (Sections 1, 2, 3, 4)
Class
Storage LayerIn this class we dive into the details of the storage hierarchy. We discuss in detail the tradeoffs between different levels of the hierarchy. We provide details for the internals of hard disks and flash disks. We further discuss the specifics of buffer management and, specifically, of buffer replacement policies.
- Slides
- Textbook, Chapter 9.1-9.4
- "On Multidimensional Data and Modern Disks", FAST 2005 (Sections 1, 2, 3, 4)
- "Design Tradeoffs for SSD Performance", USENIX ATC 2008 (Sections 1, 2, 3)
Class
Indexing with B+ TreesIn this class we dive into the details of indexing. We discuss in detail the internals of the most popular tree index in database management systems, the B+ Tree. We describe the search algorithm, the insert algorithm, and the delete algorithm. We further discuss aspects of key compression and bulk loading, two important performance optimizations.
- Slides
- Textbook, Chapter 10.1, 10.3-10.8
Class
Hash-Based IndexingIn this class we discuss the different approaches for hash-based indexing. We first introduce static hashing, which has the problem of long chains of overflow pages. Then we discuss two different ways to address this problem with dynamic hashing: extendible hashing which used a directory and has no chains, and linear hashing which uses multiple different hash functions and allows overflows pages which are split (and re-hashed frequently).
- Slides
- Textbook, Chapter 11
Midterm
You can bring with you one pages (one sheet one side) of any notes you want. No more material will be available. No laptops, tablets or phones are allowed. Please come 2-3 minutes prior to the top of the hour to start sharp at 2pm.
Class
External SortingIn this class we discuss the problem of sorting in the context of database systems. Sorting is a virtually ubiquitous operation in data management, and frequently we have to sort data that do not fit in memory. To that end external sorting algorithms are developed (that minimize number of disk accesses as opposed to number of comparisons). We discuss different sorting paradigms including external sorting and sorting with B+ Trees.
- Slides
- Textbook, Chapter 13
Class
Log-Structured Merge TreesIn this class we introduce an alternative indexing and storage organization named Log-Structured Merge Trees (LSM-Trees). We discuss in detail the internals of LSM-Trees. We describe the ingestion and the search routines, as well as, how to update and delete.
- Slides
- "Optimal Bloom Filters and Adaptive Merging for LSM-Trees", ACM TODS 2018 (Sections 1, 2, 3)
Class
Query Processing with Relational OperatorsIn this class we discuss the implementation of relational operators. We start by discussing the implementations of selections and projections. And then we will continue discussing the implementations of joins: nested loop joins, sort-merge joins, and hash joins.
- Slides (for Classes 17-19)
- "More on the Halloween Problem"
- Textbook, Chapter 12 & 14.1-14.3
Class
Joins I: Nested-Loop & Sort-Merge JoinsIn this class we continue the discussion about the implementation of relational operators. In particular we discuss Nested Loop Joins and Sort-Merge Joins.
- Notes
- Textbook, Chapter 14.4.1-14.4.2
Class
Joins II: Hash Joins & Remaining Relational OperatorsIn this class we continue the discussion about the implementation of relational operators. In particular we discuss Hash Joins, General Joins, Union/Intersection, and Aggregates.
- Notes
- Textbook, Chapter 14.4.3-14.4.4, 14.6-14.7
Class
Query OptimizationIn this class we put together all the knowledge about the SQL operators evaluation costs in order to understand how to choose how to implement a whole SQL query. We discuss the basic properties needed for query rewriting, pruning the decision search space, and the interesting orders.
- Slides
- Textbook, Chapter 15
Class
Overview of Transaction ManagementIn this class we present an overview of the transactional part of a database system.
- Slides
- Textbook, Chapter 16
Class
Concurrency ControlIn this class we discuss in detail how Concurrency Control can achieve Consistency and Isolation. We discuss two-phase locking (2PL), serializability, recoverability, and deadlocks.
- Slides (for Classes 22-23)
- Textbook, Chapter 17.1-17.5
Class
Concurrency Control (cont'd)Continued from previous class.
- Textbook, Chapter 17.1-17.5
Class
RecoveryIn this class we discuss in detail how the system can achieve Atomicity and Durability, and also ensure crash recovery. We cover in detail the Write-Ahead Logging (WAL) protocol.
- Slides
- Textbook, Chapter 18
Class
Recovery (contd.)Continued from previous class.
- Textbook, Chapter 18
Class
NoSQL Systems and Topics in DatabasesIn the last class of the semester we will discuss about NoSQL systems and active research directions in data management, and current opportunities and needs in the data management industry.
Class
ReviewThis is a review class. We will go over open questions on previous subjects and also discuss subtle details on exercises and written assignments.