CAS CS 460 - Introduction to Database Systems @ BU

In our first class we introduce the concept of database systems, which store data and offer a declarative interface to access the data. We introduce the basic building blocks of database systems that are used to offer the expressive and efficient declarative interface to the data, and we discuss the aspects of everyday life, business operation, and scientific discovery for which database systems play a crucial role.

Readings

Class : Database Systems Architectures

In this class we discuss the fundamental components that comprise a database system. We will see the commonalities and the differences of the main database system architectures and we will discuss why we have several different ones. We will go over the key characteristics of relational systems (row-stores and column-stores), and we will introduce different designs like key-value stores and graph stores. Finally, we will introduce the class projects and we will discuss in detail course logistics.

Readings

Class : ER Model

In this class we discuss the process of conceptually design our database. We will discuss how we can take specific requirements and transform them to a conceptual database schema using the Entity-Relationship Model (ER-Model). We will cover this process with examples.

Readings

Class : Relational Model

In this class we introduce the Relational Model, the most widely used model by vendors, institutions, and organizations to represent and store data. We connect this discussion with ER Model and we show how to build the relational model of an application when starting from the ER Model. This serves as a first get-to-know to SQL focusing on the DDL commands.

Readings

Class : Relational Algebra

In this class we introduce Relational Algebra, a query language used to express the implementation of queries. Relational Algebra is applied directly on relational data and can describe multiple ways of implementing the same "logical" query. We discuss the fundamental operations, their properties and the operations we can define using them (compound operations).

Readings

Class : Functional Dependencies

In this class we introduce redundancy as one of the main problems in a relational schema. We introduce Functional Dependencies (FD) as generalized keys in order to help us identify a bad schema. We discuss how to reason for FD.

Readings

Class : Decomposition & Schema Normalization

In this class we use functional dependencies to identify bad schemata and to propose how to decompose relations to avoid problems from redundancy. We further discuss how to get good decompositions, that is, having lossless-joins and being (functional) dependency preserving. We discuss several normal forms that we can achieve with a varying degree of "how much" we decompose.

Readings

Class : SQL I

In this class we first introduce the basic constructs of an SQL query and then we continue thoroughly over several examples for SQL queries, slowly building increasingly complex queries. We discuss the basic SQL query, union-compatible operations, nested queries, aggregate operators, and the GROUP BY and HAVING keywords.

Readings

Class : SQL II

Continued.

Readings

Class : File Organization & Introduction to Indexing

In this class we introduce the main concepts needs to start working with the internals of database systems. We lay the groundwork needed for the memory hierarchy, file organization, page organization, and indexing.

Readings

Class : Storage Layer

In this class we dive into the details of the storage hierarchy. We discuss in detail the tradeoffs between different levels of the hierarchy. We provide details for the internals of hard disks and flash disks. We further discuss the specifics of buffer management and, specifically, of buffer replacement policies.

Readings

Class : Indexing with B+ Trees

In this class we dive into the details of indexing. We discuss in detail the internals of the most popular tree index in database management systems, the B+ Tree. We describe the search algorithm, the insert algorithm, and the delete algorithm. We further discuss aspects of key compression and bulk loading, two important performance optimizations.

Readings

Class : External Sorting

In this class we discuss the problem of sorting in the context of database systems. Sorting is a virtually ubiquitous operation in data management, and frequently we have to sort data that do not fit in memory. To that end external sorting algorithms are developed (that minimize number of disk accesses as opposed to number of comparisons). We discuss different sorting paradigms including external sorting and sorting with B+ Trees.

Readings

Class : Log-structured Merge Trees

In this class we introduce an alternative indexing and storage organization named Log-Structured Merge Trees (LSM-Trees). We discuss in detail the internals of LSM-Trees. We describe the ingestion and the search routines, as well as, how to update and delete.

Readings

Review

This is a review class. We will go over open questions in previous subjects and also discuss subtle details on exercises, mostly on the ones that have not been part of the Homework assignments.

Readings

Midterm 1

You can bring with you two pages of any notes you want. No more material will be available. No laptops, tablets or phones are allowed.

Class : Hash-based Indexing

In this class we discuss the different approaches for hash-based indexing. We first introduce static hashing, which has the problem of long chains of overflow pages. Then we discuss two different ways to address this problem with dynamic hashing: extendible hashing which used a directory and has no chains, and linear hashing which uses multiple different hash functions and allows overflows pages which are split (and re-hashed frequently).

Readings

Class : Query Processing with Relational Operators

In this class we discuss the implementation of relational operators. We start by discussing the implementations of selections and projections. And then we will continue discussing the implementations of joins: nested loop joins, sort-merge joins, and hash joins.

Readings

Class : Joins I: Nested-Loop Joins and Sort-Merge Joins

In this class we continue the discussion about the implementation of relational operators. In particular we discuss Nested Loop Joins and Sort-Merge Joins.

Readings

Class : Joins II: Hash Joins & the remaining relational operators

In this class we continue the discussion about the implementation of relational operators. In particular we discuss Hash Joins, General Joins, Union/Intersection, and Aggregates.

Readings

Class : Query Optimization

In this class we put together all the knowledge about the SQL operators evaluation costs in order to understand how to choose how to implement a whole SQL query. We discuss the basic properties needed for query rewriting, pruning the decision search space, and the interesting orders.

Readings

Class : Guest Lecture by Stella Pantela

In this lecture we talk about Vertica, one of the leading Big Data Analytics DBMs and how Vertica deals with the handling of metadata, which is data about the data that a database stores, including table names, projection information, encoding, min, max values and many more. How does the database store, maintain and modify that metadata in a safe and efficient way? What does the metadata handling have to do with the Add Column Operation?
Bio: Stella is a senior distributed systems software engineer at Vertica, chair of the Vertica DataGals and a member of the Vertica patent committee. At Vertica, Stella builds powerful infrastructure used by big data analytics companies including Uber, Etsy, Wayfair, Philps. Her focus is on improving the distributed systems protocols for more efficient maintenance and transfer of metadata in both the cloud and enterprise environments. Stella’s work has been featured at SIGMOD and North East Database Day. Before Vertica, Stella completed her B.A. with Honors in Computer Science at Harvard University in 2015, where she also conducted research in the area of column stores with Professor Stratos Idreos. In her free time she maintains a blog on distributed systems and technology at stylianipantela.com.

Class : Overview of Transaction Management

In this class we present an overview of the transactional part of a database system.

Readings

Class : Concurrency Control

In this class we discuss in detail how Concurrency Control can achieve Consistency and Isolation. We discuss two-phase locking (2PL), serializability, recoverability, and deadlocks.

Readings

Class : Concurrency Control (cont.)

Class : Recovery

In this class we discuss in detail how the system can achieve Atomicity and Durability, and also ensure crash recovery. We cover in detail the Write-Ahead Logging (WAL) Protocol.

Readings

Class : NoSQL Systems and Topics in Databases

In the last class of the semester we will discuss about NoSQL systems and active research directions in data management, and current opportunities and needs in the data management industry.

Readings

Class : Midterm 2

You can bring with you two pages (in one sheet) of any notes you want. No more material will be available. No laptops, tablets or phones are allowed.

Class at a glance

Announcements

Class Schedule

Class document.write(cday+1): Introduction

Readings

Class document.write(cday+1): Database Systems Architectures

Readings

Class document.write(cday+1): ER Model

Readings

Class document.write(cday+1): Relational Model

Readings

Class document.write(cday+1): Relational Algebra

Readings

Class document.write(cday+1): Functional Dependencies

Readings

Class document.write(cday+1): Decomposition & Schema Normalization

Readings

Class document.write(cday+1): SQL I

Readings

Class document.write(cday+1): SQL II

Readings

Class document.write(cday+1): File Organization & Introduction to Indexing

Readings

Class document.write(cday+1): Storage Layer

Readings

Class document.write(cday+1): Indexing with B+ Trees

Readings

Class document.write(cday+1): External Sorting

Readings

Class document.write(cday+1): Log-structured Merge Trees

Readings

Review

Readings

Midterm 1

Class document.write(cday+1): Hash-based Indexing

Readings

Class document.write(cday+1): Query Processing with Relational Operators

Readings

Class document.write(cday+1): Joins I: Nested-Loop Joins and Sort-Merge Joins

Readings

Class document.write(cday+1): Joins II: Hash Joins & the remaining relational operators

Readings

Class document.write(cday+1): Query Optimization

Readings

Class document.write(cday+1): Guest Lecture by Stella Pantela

Class document.write(cday+1): Overview of Transaction Management

Readings

Class document.write(cday+1): Concurrency Control

Readings

Class document.write(cday+1): Concurrency Control (cont.)

Class document.write(cday+1): Recovery

Readings

Class document.write(cday+1): NoSQL Systems and Topics in Databases

Readings

Class document.write(cday+1): Midterm 2

Class : Introduction

Class : Database Systems Architectures

Class : ER Model

Class : Relational Model

Class : Relational Algebra

Class : Functional Dependencies

Class : Decomposition & Schema Normalization

Class : SQL I

Class : SQL II

Class : File Organization & Introduction to Indexing

Class : Storage Layer

Class : Indexing with B+ Trees

Class : External Sorting

Class : Log-structured Merge Trees

Class : Hash-based Indexing

Class : Query Processing with Relational Operators

Class : Joins I: Nested-Loop Joins and Sort-Merge Joins

Class : Joins II: Hash Joins & the remaining relational operators

Class : Query Optimization

Class : Guest Lecture by Stella Pantela

Class : Overview of Transaction Management

Class : Concurrency Control

Class : Concurrency Control (cont.)

Class : Recovery

Class : NoSQL Systems and Topics in Databases

Class : Midterm 2