From ff3e38643333fa27199ff6235847e6494bc790ae Mon Sep 17 00:00:00 2001 From: ION606 Date: Wed, 29 Jan 2025 18:00:11 -0500 Subject: [PATCH] added database systems notes --- Database Systems/Notes.md | 4153 +++++++++++++++++++++++++++++++++++++ 1 file changed, 4153 insertions(+) create mode 100644 Database Systems/Notes.md diff --git a/Database Systems/Notes.md b/Database Systems/Notes.md new file mode 100644 index 0000000..86c204d --- /dev/null +++ b/Database Systems/Notes.md @@ -0,0 +1,4153 @@ +## A note: +my server crashed towards the end of the year, so most of the images are gone. I apologize for the inconvenience. + +::: info +[Course Home](https://www.cs.rpi.edu/academics/courses/fall24/csci4380/) + +[Course Notes](https://www.cs.rpi.edu/academics/courses/fall24/csci4380/fall2024/course_notes/intro.html#course-notes) + +[Course Recordings](https://mediasite.mms.rpi.edu/mediasite/Channel/c625d7df57dc414d829c7b7b587f5c265f) + +::: + +### Terminology + +- Database Management Systems (DBMS) - a software tool for storing/managing large amounts of data +- Database Server - a specific installation of a DBMS +- Database - a collection of data (often in a DBMS) organized for a specific application (also see [Database Section](https://cloud.ion606.com/apps/files/files#h-databases)) +- Database Application - a software product that uses DBMSs to store one or more databases for a specific purpose +- Database Schema + - what types of data are valid to store + - fixed model + - Hard/expensive to change once implemented + - Does NOT contain the data itself + - **attributes are just the column names** +- Database Instance - + - the actual data that satisfies the rules of the database schema + - changing facts, what is true about the data at the moment +- Relational Data Model - the most popular way to describe data schema +- Data Model + - the type of data that can be stored + - rules about the data (Database Schema) + - design so that you hopefully never have to make changes, cause making changes later on is difficult +- Transaction - a program that changes data or a sequence of database operations that satisfies the ACID properties (which can be perceived as a single logical operation on the data) +- ACID - see [ACID Section](https://cloud.ion606.com/apps/files/files/34361?dir=/School/Senior%20Year/Datbase%20Systems&openfile=true#h-a-c-i-d) +- Relational Data Model - see [Relational Data Model](https://cloud.ion606.com/apps/files/files/34361?dir=/School/Senior%20Year/Datbase%20Systems&openfile=true#h-relational-data-model) section +- Key - some attribute that determines other keys, you can have multiple keys +- minimal key - the minimum set of attributes needed to get the correct info +- super key - any superset containing the minimal key (any superset of the minimal key) +- Data: actual information/facts satisfying data model +- Tuple: A set of attributes and a value of each attribute +- Relational Database: + - A set of relations: + - A relation: A class of objects we want to store information about + - Relation instances contain sets of tuples, each tuple is an object of this class +- Database + - Database Schema + Database Instance + Application Logic + - Relational Data Model + - set of relations +- BCNF - see [BCNF](https://cloud.ion606.com/apps/files/files/34361?dir=/School/Senior%20Year/Datbase%20Systems&openfile=true#h-boyce-codd-normal-form-bcnf) section +- Entity-Relationship Models - See [ER](https://cloud.ion606.com/apps/files/files/34361?dir=/School/Senior%20Year/Datbase%20Systems&openfile=true#h-entity-relationship-models-er) section + +### What Makes a DBMS + +1. data model +2. store massive amounts of data +3. query language - allow access (read/write/update) to stored data easily +4. durability - data is safe even after something like a power outage +5. concurrent access - multiple users can read/write the same data without compromising integrity + +### DBMS Components + +- Storage Manager + - index or file manager +- Database Language Tools + - DML - Data query or manipulation language compiler + - DDL - Data definition language +- Query Execution Engine + - Buffer Manager +- Transaction Manager + - Logging and Recovery + - Concurrency Control +- Database Admin + - responsible for designing the data model +- Database Programmer + - responsible for writing application software that stores the database +- Systems Admin + - responsible for installation and tuning the DBMS system + +### A C I D + +a set of properties of database transactions intended to guarantee data validity despite errors, power failures, etc. + +**__ACID stands for:__** + +- Atomicity - transactions must be completed fully or leave no effect on the database +- Consistency - DBMS must not allow programmers to violate consistency rules for a database schema +- Isolation - multiple transactions executed at the same time should result in the same thing as executing them one at a time +- Durability - once a transaction completes, DBMS must record ALL its results and make sure they're not lost + +::: info +Example: A transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction + +::: + +### Databases + +- given by data schema/model (rules regarding data) and the database instance (the data) +- more here later..... + +### Data Model + +- Logical Data Model + - Relations and attributes + - Constraints (what is valid data and what is not) + - relation, tuple, attribute +- Physical Data Model + - Where to store the data + - which file systems (distributed, replicated) + - How to store the data + - which indices to create + - table, row, column +- Application Logic + - Built on top of database queries + - declarative: write once and optimize on top of the logical data model + +## Relational Data Model + +**Definitions** + +- Relations (or tables) - store information +- Attribute (or column) - a property of a specific object represented by a relation +- Domain - a set of valid input + - Simple domains are integers/strings + - Complex Domains: + - can be defined with restrictions over these domains + - example: an 8-digit integer that starts with 6 +- Schema - the names/domains of/for the attributes + +**Structure** + +- A relation contains a set of tuples +- A valid relation instance is made of tuples containing: + - values for all attributes in the relation schema that are drawn from the domain with that attribute + +**Logical vs Physical Names** + +- Logical + - the mathematical definition of the relational data model + - based on a set of semantics +- Physical + - the storage/implementation of the data model + - the implementation might not be identical to the logical model + +#### **Example Relations and Representations** + +__TABLE__ + +__LOGICAL RERESENTATION (TUPLES)__ + +```SQL +Hero('Black Panther', 'T''Challa') +Hero('Flash', 'Barry Allen') +Hero('Jessica Jones', 'Jessica Jones') +``` + +__LOGICAL REPRESENTATION (SET)__ + +```set +Hero = { <'Black Panther':Alias, 'T''Challa':Name>, + <'Flash':Alias, 'Barry Allen':Name>, + <'Jessica Jones':Alias, 'Jessica Jones':Name> } +``` + +### Rules of Relational Data Models + +- domain attributes MUST be simple + - integer + - float + - decimal + - string + - boolean + - date + - time + - timestamp + - restrictions of these (9-digit integer) + - restrictions are called the first normal form (1NF) + - attributes are indivisible pieces of info (not lists or sets for example) + - relations are flat pieces of information + +### + +Relational Data Model + +Each attribute comes with a domain: set of valid values: integer, boolean, string, date/time + +A relation is a set of tuples, tuples are a set of attributes + +#### Example 1: + +Relations: Books + +Tuple: A single book Attributes: isbn(string), title(string), author(string), price(money), publisher(string), + +The order of things in attributes doesn’t matter, cause sets don’t have order + +Books(‘14236-7788’, ‘War and Peace’, ‘Leo Tolstoy’, 24.99, ‘Pearson') + +Books(‘1453456-999’, ‘Crime and Punishment’, ‘Dostoyevsky’, 4.99, ‘Pearson') + +| isbn | title | author | price | publisher | +|-------------|----------------------|-------------|-------|-----------| +| 14236-7788 | War and Peace | Leo Tolstoy | 24.99 | Pearson | +| 1453456-999 | Crime and Punishment | Dostoyevsky | 4.99 | Pearson | + +**A minimal key would be \[title, author\]** + +#### Example 2: + +R1 = { t1, t2, t3 } + +R2 = { t1, t2, t3, t4 } + +R1 and R2 are the same relation A key would be \[title, author\] + +### Projection + +syntax: `project_{property}{set}` + +### Selection + +![Selection.png](.attachments.34361/image.png) + +### Cartesian Product + +R x S = { t such that t has all attributes in R and all the attributes in S, such that there is a tuple r in R and a tuple s in S where t is equal to r for attributes in R and to s for attributes in S + +R(A, B) = {, } + +S(C, D) = {, , } + +T = R x S + +T(A, B, C, D) = {, , , , , } + +### Theta Join + +R Join\_{C} S = { all tuples in R x S that satisfy join condition C } + +A join condition C is a condition that refers to comparisons between attributes of R and attributes of S + + + +### Operators + +**% - Like** `.*` **in regex** + +- This could be like `%EGG` will find `(.*)EGG` +- This could also be like `EGG%` will find `EGG(.*)` + +**<> - NEQ** + +- WHY IN THE EVER-LOVING **FUCK** IS THIS THE NEQ OPERATOR + +**join\_{attr=attr1}** + +- natural join with one column +- takes one col and maps it to the other +- does not copy things without matching values + +**Example:** + +`set1 join_{attr_in_set_1=attr_in_set_2} set2` + +**Example:** + +``` +R(Num) +find the largest Num in R +the largest Num in R is the only Num which is not smaller than another Num2 in R +to find the Nums which are smaller than another Num2 we join R to a copy of itself +R2(Num2) = R +Join = R join{Num + +A dependency function (FD) is a database constraint that determines the relationship of one attribute to another in a [database management system (DBMS)](https://www.geeksforgeeks.org/introduction-of-dbms-database-management-system-set-1/). Functional dependencies help maintain the quality of data in the database. Functional dependence is a relationship that exists between two attributes. It usually exists between the primary key and non-prime attributes in the table. + +**Example:** **X -> Y** + +In this case, the left side of the arrow is the determinant and the right of the arrow is dependent. X will be the primary attribute and Y will be a non-prime attribute of the table. It shows that column X's attributes uniquely identify column Y's attributes to satisfy this functional dependency. + +***AKA each value on the left side of the arrow is associated with exactly one thing on the right side of the arrow*** + +#### Functional Dependency Keys + +A set of keys that implies all other dependencies + +**Example:** + +You are given the following set F2 of functional dependencies for relation R(A,B,C,D,E,F): +F2 = {AB -> CD, D->E, CA->B} + +The keys would be ABF and ACF + +### **Inference Rules** + +FDs stands for Functional Dependencies. These are the set of attributes, which are logically related to each other. + +**There are 6 inference rules:** + +- **Reflexive Rule:** if B is a subset of A then A logically determines B. Formally, **B ⊆ A** then **A → B**. + - Example: Let us take an example of the Address (A) of a house, which contains so many parameters like House no, Street no, City, etc. These all are the subsets of A. Thus, address (A) → House no. (B). +- **Augmentation Rule:** It is also known as [**Partial dependency**](https://www.geeksforgeeks.org/differentiate-between-partial-dependency-and-fully-functional-dependency/). If A logically determines B, then adding any extra attribute doesn’t change the basic functional dependency. + - Example: **A → B**, then adding any extra attribute let's say C will give **AC → BC** and doesn’t make any change. +- **Transitive rule:** if A determines B and B determines C, then it can be said that A indirectly determines C. + - Example: If **A → B** and **B → C** then **A → C**. +- **Union Rule:** If A determines B and C, then A determines BC. + - Example: If **A → B** and **A → C** then **A → BC.** +- **Decomposition Rule:** It is perfectly the reverse of the above Union rule. If A determined BC then it can be decomposed as A → B and A → C. + - Example: If **A → BC** then **A → B** and **A → C.** +- **Pseudo Transitive Rule:** If A determines B and BC determines D then BC determines D. + - Example: If **A → B** and **BC → D** then **AC → D**. + +### Prime Attribute + +Given a relation R and a set F of fds, X is a superkey if X+ is all attributes in R (in other words: X->X+ is in F+). + +### Basis + +A set of functional dependencies forms a basis, if there is only one attribute on the right-hand side of each functional dependency + +### Minimal Basis: + +A set of functional dependencies F if we can not remove any fd or any attributes without changing the meaning (closure) + +##### Algorithm for Converting a set F to a minimal basis + +1. convert F to a basis form by using the splitting rule +2. Remove all trivial dependencies +3. Suppose X --> Y is in F, create F' by removing X --> Y + 1. If X+ is the same in F and F' then C --> Y can be removed + 2. AKA if we attempt to remove the functional dependency and the closure is the same, then the FD was not important, as it can just be reconstructed from the inverse (Y->X) + +``` + + +COPY THIS EXAMPLE LATER (jesus christ) +``` + +### BOYCE-CODD NORMAL FORM (BCNF) + +Given a relation R and a set of fds F, R is in BCNF iff for all fds in F of the form X -> Y one of the following is true: + +1. X is a superkey of R, or +2. X -> Y is trivial. +3. Y is prime attribute + +If a relational is in BCNF, then it is also in 3NF + +NOTE\*: To formally find all keys, you must go through all subsets. Remember to get rid of superkeys once you find a minimal key + +For example: + +``` +given 2 keys: AB, BC +which give you +AB+ = (A, B, C, D) +BC+ = (A, B, C, D) + +the keys would be +AB+ = (A, B, C, D) +BC+ = (A, B, C, D) +BD+ = (B, D) + +Superkeys: AB, ABC, ABD, ABCD, BC, BCD +Prime Attributes: A, B, C + +BCNF: +AB --> C (OK because AB is a superkey) +AB --> D (OK because AB is a superkey) +C --> A (NOT OK becauseC is not a superkey and C --> A is not trivial) + +3NF: +AB --> C (OK because AB is a superkey) +AB --> D (OK because AB is a superkey) +C --> A (OK ONLY IN 3NF NOT DCNF because A is a prime attr) +A --> A (OK because trivial) +ABD --> C (OK because ABD is a superkey) + +is in 3NF +``` + +Prime attributes: appear in all keys + +### Equivalency: + +Two sets of functional dependencies F1 and F2 over the same relation R are equivalent if: + +F1 = { A->C } + +F2 = { A -> C, A -> A } + +F1+ = F2+ + +These are equivalent because ignoring trivial dependencies (A -> A) they are the same + +#### Decomposition + +A decomposition of R into R1, R2, ...., Rn is valid if R, R2, etc make up all of the attributes of R and is given by + +R1 = project\_{attributes of R1} (R) + +R2 = project\_{attributes of R2} (R) + +. . . . + +Rn = project\_{attributes of Rn} (R) + +a good decomposition is: + +- lossless required property, all decompositions should be lossless + - a decomp is lossless IF AND ONLY IF we are guaranteed that for every possible instance of R that R < R1 \* R2 .... Rn +- dependency preserving (desired property) + +### Multi-valued dependency + +Represented by "->>". Means that the value on the right-hand side can be multiple values. + +A multi-valued dependency of the form A1 ... AN ->> B1 ... Bm means that for all pairs of tuples t1 and t2 that agree on A (everything on the left), we can find a tuple v in R such that: + +- v agrees with t1 and t2 on A's +- v agrees with t1 on B's +- v agrees with t2 on the remaining attributes (not A's or B's) + +Ex in class: + +rin ->> hobby + +rin ->> phone_number + +For a given rin, there can be multiple values for a hobby and/or phone_number. + +#### Inference rules for MVDs + +![image (6).png](.attachments.34361/image%20%286%29.png) + +Every FD is a MVD. Every MVD is not necessarily an FD. This rule is called FD promotion. + +Complementation rule: If A1 ... AN =>> B1 ... Bm is true and C1 ... Ck are all attributres in R that are not As or Bs then A1 ... An =>> C1 ... Ck is also true. + +#### 4NF: + +A relation is in fourth normal form iff whenever A1... An =>> B1 ... Bm is a non-trivial MVD, then A1...An is a superkey. The notions of keys and superkeys depend on f.d.s only; adding MVDs does not change the definition of "key". To decompose a relation into fourth normal form, use an algorithm similar to BCNF decomposition algorithm using MVDs. Relations in 4NF C\_ Relations in BNCF C\_ Relations in 3NF. + +# COPY EXAMPLE HERE + +### Hw 1 notes from kuzmin: + +min-max functions do not exist + +cannot sort, select the best thing to use? + +RelaX: (recommended tool for checking your answer) + +![NEED TO FIX THIS IMAGE IM WRONG](.attachments.34361/image%20%283%29.png) + +![image (4).png](.attachments.34361/image%20%284%29.png) + +## Normalization + +Database structure such that any table can NOT express redundant info (no 2 birthdays per customer for example) + +#### Normal Forms + +Sets of data safety assessments/safety guarantees + +###### **First Normal Form** + +**Violating FNF** + +- if you're using row order to convey information because row order is not maintained in a database +- mixing data types +- repeating groups + - re-adding data to each row + - like an inventory where you add items again and again to each table like \[shield, shield, shield\] + +__Rules__ + +1. Using row order to convey information is not permitted +2. mixing data types within the same column is not permitted +3. having a table without a primary key is not permitted +4. repeating groups are not permitted + +**Solution**: + +1. Add primary key +2. structure the table to avoid redundancies + 1. keep the count of every item in a player inventory instead of storing duplicates + +###### Second Normal Form + +**Definition: each non-key attribute must depend on the ENTIRE primary key** + +deletion anomaly: deleting unrelated data breaks the logic + +update anomaly: changing unrelated data breaks the logic + +update insertion: having no data breaks the logic + +###### Third Normal Form + +Definition: + +1. No non-key attribute may NEVER depend on a non-key attribute +2. Put another way, every non-key attribute in the table should depend on the key, the whole key, and nothing but the key (lmao) + +Transitive Dependency: An attribute is dependent on an attribute that is dependent on another attribute + +###### Fourth Normal Form + +Definition: The only multivalued dependencies in a table MUST be dependencies on the key + +Multi-value dependency: + +- expressed using double arrow +- + +## Entity-Relationship (ER) Models + +- Method for designing databases +- Helps give high-level view of the whole database, while normalization is more geared toward optimizing individual relations +- Help modularize database design +- ER models are object-oriented, not relational + +#### ER Data Models + +- ER Data models design a whole database using entities and relationships +- **ER Data models design a whole database using entities and relationships** +- Converting ER diagrams to a relational model: + - 1\. Convert each entity into a new relation R. Map entity keys for relation R. Map all other attributes to attributes of relation R. + - 2\. Convert relationships based on cardinality + - One-to-one/one-to-many: Map the entity E1 that has one of the other entity E2 by adding E2's key as an attribute. + - Many-to-many: Create a new relation R: Include in R the keys of all joining entities. The keys must include the keys of all entities that have an N participation. + - Lossy decomposition: representing ternary relationship in three binary relationships does not give the same exact result +- foundational approach for database design +- focus on representing entities, their attributes, and the relationships between them +- ensure a clear and modular database structure +- play an important role in providing a high-level perspective before the database is normalized or transformed into a relational model. +- **Purpose**: ER models are used for designing databases and offer a high-level, object-oriented view of the data structure. +- **Normalization vs ER Models**: While normalization focuses on optimizing individual relationships, ER models help simplify the database by modularizing it into entities. +- **Modularization**: Entities represent major components, and relationships link these entities to one another. +- **Commonality**: ER modeling is widely used but is not the only database design method. + +**Key Points**: + +- Focus on entities and relationships. +- Modular design helps make normalization easier. + +--- + +### **ER Data Models** + +- **Entities and Relationships**: The core of ER modeling is to define entities (objects or classes) and relationships (connections) between them. +- **Relational Model Mapping**: Once the ER model is complete, it can be mapped to a relational data model. For example, after defining entities such as "Student" and "Faculty," they can be converted to relational tables. + +--- + +### **Entity Classes and Attributes** + +- **Entities**: An entity represents a class of objects, and each entity has attributes that describe its characteristics. + - **Attributes**: Should be simple values (no sets or multi-valued attributes). + - **Key Attributes**: An entity must have a key attribute (or a combination of attributes) to ensure uniqueness. + +**Example**: + +- **Faculty**: `{id, name}` + - The key is `id`. +- **Students**: `{id, name}` + - The key is `id`. + +**Notation**: + +- Entities are represented with boxes, attributes with ellipses, and key attributes are underlined. + +--- + +### **Relationships** + +- **Linking Entities**: Relationships connect entities to one another. They represent how entities interact, such as "Students take Classes" or "Faculty work in Departments." +- **Participation Constraints**: These specify how many instances of an entity participate in the relationship. Participation can be one-to-one, one-to-many, or many-to-many. + +**Example**: + +- One-to-many relationship: Each department has many faculty members. +- Many-to-many relationship: Students can take multiple classes, and each class can have many students. + +### **Keys in Relationships**: + +- Relationships do not generally have keys, although some conventions might allow it. + +--- + +### **Recursive Relationships** + +- Sometimes, an entity can be linked to itself through a relationship. + - **Example**: A faculty can mentor other faculty members, establishing a "mentor-mentee" relationship within the same entity. + +![image (13).png](.attachments.34361/image%20%2813%29.png) + +--- + +### **Relationship Attributes** + +- Relationships can have attributes, but these attributes should pertain to the relationship itself, not the connected entities. + - **Example**: A "grade" could be an attribute of the relationship between a student and a class they are enrolled in. + +![image (14).png](.attachments.34361/image%20%2814%29.png) + +--- + +### **Key Considerations in ER Models** + +#### Referential Integrity + +- Arrows represent the constraint that there is at most one entity of a type in the relationship. + - **Example**: Each department has exactly one chair, and a department cannot exist without a chair. + +![image (15).png](.attachments.34361/image%20%2815%29.png) + +#### Ternary Relationships + +- Involves three entities but should be used carefully. Many ternary relationships can be decomposed into binary relationships. + - **Example**: A faculty advising multiple students on different majors might seem ternary, but binary relationships between faculty and students or faculty and majors may suffice. + +![image (16).png](.attachments.34361/image%20%2816%29.png) + +--- + +### **Weak Entities** + +- A weak entity is dependent on a strong entity and cannot be uniquely identified without it. + - **Example**: Dependents of employees. The dependent name is unique only in the context of the employee. +- The key for a weak entity is not guaranteed to be unique in the database +- Think of the weak entity as a special subclass of some other entities + +--- + +### **Design Rules** + +- **Entity Must Have a Key**: Each entity must have a unique key that defines its identity. +- **Avoid Redundancy**: Do not repeat data unnecessarily; make separate entities when needed. +- **Minimize Complexity**: Avoid ternary or higher relationships if binary ones suffice. + +--- + +### **Converting ER to Relational Model** + +1. **Entities**: Mapped to tables, with their attributes becoming columns. +2. **Relationships**: + - One-to-many relationships map the foreign key of the "many" side into the table of the "one" side. + - Many-to-many relationships are ALWAYS represented by an additional table. +3. **Weak Entities**: Combined with supporting strong entities into a single table. + +**Example**: + +- **Employees**: `{Id, firstname, lastname, ...}` +- **Departments**: `{DeptId, DeptName, ...}` +- **Employee-Department Relationship**: Employees work in one department (one-to-many relationship) + +--- + +### **Types of Relationships** + +#### One-to-Many + +- Represented with arrows from one entity to another. + - **Example**: Faculty to Department (each faculty belongs to one department, but each department can have many faculty) + +![image (7).png](.attachments.34361/image%20%287%29.png) + +![image (8).png](.attachments.34361/image%20%288%29.png) + +#### One-to-One + +- Both sides of the relationship have a "one" constraint. + - **Example**: Each department has one chair, and each faculty can be chair of one department. + +![image (9).png](.attachments.34361/image%20%289%29.png) + +![image (10).png](.attachments.34361/image%20%2810%29.png) + +#### Many-to-Many + +- The most common type of relationship, where multiple instances of both entities can interact. + - **Example**: Students and Classes (students enroll in many classes, and classes have many students). + +![image (11).png](.attachments.34361/image%20%2811%29.png) + +![image (12).png](.attachments.34361/image%20%2812%29.png) + +--- + +## **Subclasses** + +### **Subclasses in ER Models** + +The **subclasses** section in Entity-Relationship (ER) models discusses how entities that share common attributes can be structured in a hierarchical manner. Subclasses are used when there is a need to represent entities that are specialized versions of a more general entity class, allowing inheritance of attributes and keys. + +--- + +### **Key Concepts in Subclasses** + +#### **Generalization and Specialization** + +- **Generalization**: When multiple entities share common attributes, they can be generalized into a parent (superclass) entity. The individual entities (subclasses) inherit the attributes and key of the superclass. +- **Specialization**: Subclasses represent specialized entities that have additional attributes not shared with other subclasses or the parent. + +#### **Type Hierarchy** + +- In the subclass hierarchy, entities are organized in a **type hierarchy**, where each subclass inherits attributes from the parent entity class. + - The key and attributes of the parent entity (superclass) are passed down to the subclasses. + +--- + +### **Example of Subclass Structure** + +- **Superclass**: `People` + - Attributes: `person_id, name` +- **Subclasses**: + 1. **Students** (inherits from `People`) + - Attributes: `person_id, name, class` + 2. **Staff** (inherits from `People`) + - Attributes: `person_id, name, salary` + +In this example, both `Students` and `Staff` inherit the `person_id` and `name` attributes from the `People` entity, but they also have their own specific attributes such as `class` (for students) and `salary` (for staff). + +--- + +### **Disjoint and Overlapping Subclasses** + +#### **Disjoint Subclasses** + +- **Disjoint** subclasses mean that an entity can belong to only one subclass at a time. + - **Example**: A person can either be a student or a staff member, but not both. + +#### **Overlapping Subclasses** + +- **Overlapping** subclasses mean that an entity can belong to multiple subclasses at once. + - **Example**: A person could be both a student and a staff member, such as a teaching assistant who is also enrolled in classes. + +--- + +### **Covering and Partial Subclasses** + +#### **Covering Subclasses** + +- In **covering** subclasses, all instances of the superclass must belong to at least one subclass. + - **Example**: All people in the `People` entity must either be a student or staff. No person can exist that is not part of one of these two subclasses. + +#### **Partial Subclasses** + +- In **partial** subclasses, some instances of the superclass may not belong to any subclass. + - **Example**: There could be people in the `People` entity who are neither students nor staff, representing individuals outside the scope of these two subclasses. + +--- + +### **Mapping Subclasses to a Relational Model** + +There are three basic ways to map a subclass hierarchy to a relational model: + +#### **1. Storing Only Unique Information in Each Relation** + +- In this method, only the attributes unique to each subclass are stored in the subclass tables, while the common attributes are stored in the superclass table. + +**Example**: + +```sql +People(person_id, name) -- Superclass +Students(person_id, class) -- Subclass +Staff(person_id, salary) -- Subclass +``` + +- **Advantages**: Easy to find all people (common superclass table). +- **Disadvantages**: Joins are required to retrieve full information about a student or staff, leading to slower queries. + +#### **2. Map Each Entity to a Separate Relation** + +- Each subclass and the superclass are stored in separate tables, with repeated attributes included in each table. + +**Example**: + +```sql +People(person_id, name) -- Superclass +Students(person_id, name, class) -- Subclass +Staff(person_id, name, salary) -- Subclass +``` + +- **Advantages**: Faster queries when retrieving information about a specific subclass. +- **Disadvantages**: Requires unions when querying for all people, as the data is spread across multiple tables. + +#### **3. Combine All Information in a Single Relation** + +- All data, including subclass-specific attributes, are stored in a single table, with some columns left `NULL` when they don't apply to an instance. + +**Example**: + +```sql +People(person_id, name, class, salary, is_student, is_staff) +``` + +- **Advantages**: Simplified data model, fast queries. +- **Disadvantages**: There may be many null values (e.g., `class` for staff members or `salary` for students), and the model may become harder to manage and query. + +--- + +### **Choosing a Mapping Strategy** + +The choice of mapping strategy depends on factors like the class hierarchy's **disjoint** or **overlapping** nature, and whether it is **covering** or **partial**. For example: + +- If the subclasses are disjoint and covering, storing all the information in a single table may be efficient. +- If the subclasses are overlapping and partial, mapping each subclass to a separate table might be the better option. + +--- + +### **Summary of Subclasses in ER Models** + +- Subclasses allow for more detailed data modeling when entities share common attributes but also have their own specialized characteristics. +- The decision on how to map subclasses to a relational model should consider factors like performance, query complexity, and data integrity. + +This structure helps ensure that the database accurately models real-world entities and relationships while optimizing for performance and maintainability. + +--- + +# SQL + +- SQL is an industry standard language for relational databases. +- Almost all database management systems implement SQL the same, except: + - Core of the SQL standard is the same across all databases + - Advanced features may vary from database to database + - It is highly advisable to write queries that are portable from system to system: no bells and whistles unless it really gets you some strong performance gains. +- We will try to distinguish between core and special features as much as possible. +- A logical/declarative query language +- Express what you want, not how to get it +- Each SQL expression can be translated to multiple equivalent relational algebra expressions +- SQL is tuple based, each statement refers to individual tuples in relations +- SQL has bag semantics +- Recall RDBMS implementations of relations as tables do not require tables to always have a key, hence allowing the possibility of duplicate tuples. + + Same is true for SQL, an SQL expression may return duplicate tuples, unless they are removed explicitly. +- SQL is case insensitive (though strings are case sensitive of course) +- Syntax: + - All statements must end with a semi-colon! + - Strings are single-quoted. + +### Components + +- Query language: + + ``` + SELECT ... FROM ... WHERE ... + ``` + + allows you to write queries to find what is stored in databases. +- DML: data manipulation language + + ``` + INSERT + UPDATE + DELETE + ``` + + allows you to change the contents of the existing tables +- DDL: data definition language + + ``` + CREATE DATABASE + CREATE TABLE + ALTER TABLE + DROP TABLE + ``` + + allows you to define database objects: schema, tables, indices, etc. + +### Control Flow + +1. From: read relations involved in the form +2. Where: check for each tuple if it passes the where clause +3. Select: + 1. for tuples that pass the where clause + 2. construct the output by the projection of attributes in select + +## Syntax + +#### General + +```SQL +SELECT + baker +FROM + bakers +WHERE + hometown = 'London' + and age < 30; +``` + +this is equivalent to + +`project_{ baker}(select_{ hometown == 'London' and age < 30 }(Bakers))` + +This will have duplicates however, so we use... + +#### Duplicate Removal + +```SQL +SELECT DISTINCT + baker +FROM + bakers +WHERE + hometown = 'London' + and age < 30; +``` + +#### SELECT + +- You can rename attributes returned +- You can use expressions over the attributes +- You can return constants +- Optionally, you can remove duplicates using distinct (only one DISTINCT clause in a single query) + +```SQL +SELECT + LEFT(fullname, strpos(fullname, ' ')) as firstname, + UPPER(substring(fullname from strpos(fullname, ' ')+1)) as lastname, + 'baker' as position, + occupation || ' from: ' || hometown as label + FROM + bakers ; + +-- position is a new column with a fixed value, constant 'baker' +-- firstname is a substring of a column +-- label is a concatenation of two strings +-- functions can be combined in complex expressions +``` + +#### WHERE + +- WHERE statement is equivalent to the selection in relational algebra. +- It contains a Boolean expression over individual tuples +- For each tuple produced by the FROM statement, we check whether the WHERE statement is true. + +#### FROM + +running `SELECT * FROM bakers, technicals ;` will create a **cartesian product** from the two tables + +if we want to do a **join** we MUST include a join condition + +``` +SELECT * +FROM bakers b, technicals t +WHERE b.baker = t.baker; +``` + +- The variables b and t are aliases for the table names, especially needed if the two tables have attributes with the same name +- `SELECT attributes FROM R1,R2,.., Rn WHERE Conditions` is equivalent to + + ![image (18).png](.attachments.34361/image%20%2818%29.png) + +### Regular Expressions using LIKE + +You can compare a string using regular expressions, but you **must** `LIKE` (not `=`) + +- % stands for 0 or more characters +- \_ stands for exactly 1 character + +```SQL +days LIKE '%R%' +days LIKE '_R' +days = 'R' +days = '%R%' +``` + +*Note: you can change the escape char using the* `ESCAPE` *keyword* + +```SQL +like '%x%bc' ESCAPE 'x' +// is the same as +like '%\%bc' +``` + +#### Special Characters + +- Strings are delimited by single quote + - **__Escape single quote by repeating it__**: + + ``` + SELECT + 'professor''s cat' ; + ``` +- Any special character needs to be escaped. The general escape character is `\`. + + ``` + select name || E'\n' || email from students ; + ``` + + Returns values that has a newline in them. + +#### NULL + +- any comparison involving a NULL value returns UNKNOWN +- WHERE statement will only return tuples that evaluate to True. Any tuples with UNKNOWN values are eliminated. +- Boolean conditions with UNKNOWN statements need to be evaluated first + +```SQL +NULL = 5 -- evaluates to UKNOWN +NULL > 5 -- evaluates to UKNOWN +NULL LIKE '%' -- evaluates to UKNOWN + +NULL = 5 OR 4>5 -- EVALUATES TO UNKNOWN +NULL = 5 AND 4>5 -- EVALUATES TO FALSE +``` + +- To check a value is NULL or not, no selection criteria will work. + - you MUST use the `IS NULL` or `IS NOT NULL` keywords + +```SQL +select * from abc where val is NULL ; -- returns 1 tuple +select * from abc where val is NULL or val like '%'; -- returns all tuples +``` + +#### Complex expressions + +- SQL has many functions for different data types +- Any expression involving these functions are allowed +- Some example functions: + - String operations: `||, upper, lower, position, substring, trim` + - Numerical operations: `+,-,*,/,%,^,!` + - Mathematical operations: `abs, ceil, floor, log, mod, round, sqrt` + - Utilities: `random, now` + +### Date-based data types + +Data types: + +- Date (year, month, day) +- Time of day +- Timestamp (date and time combined) +- Interval (a time duration) + +complex example: + +```SQL +date '2016-01-28' + 2 = date '2016-01-30' --default assumption of day +date '2016-01-28' + interval '2 day' = timestap '2016-01-30 00:00:00' +date '2016-01-28' + interval '3 hours' = timestamp '2016-01-28 03:00:00' +timestamp '2016-01-28 03:00:00' + interval '10 hours' = timestamp '2016-01-28 13:00:00' +time '12:00:00' + interval '8 hours' = time '20:00:00' +date '2016-05-19' - date '2016-01-28' = 112 -- integer number of days +``` + +*Note: Postgresql functions allow complex operations over date/time* + +```SQL +extract(field from timestamp) --day, month, year, hour, + --minute, seconds, dow +select extract(year from now()); +date_part +----------- +2016 +(1 row) +``` + +#### **Examples:** + +```SQL +-- Convert between data types: +to_char(timestamp, text) +to_date(text, text) +to_date('02 29 2016', 'MM DD YYYY') + +-- check whether two time intervals overlap with each other +select (date '2016-03-01', date '2016-03-31') overlaps + (date '2016-02-25', date '2016-03-04'); +-- returns True + +select (date '2016-03-01', date '2016-03-31') overlaps +(date '2016-02-25', date '2016-02-29'); +-- returns False + +-- Find requirements that have been enforced for at least 1 year +select * from requires where cast(now() as date) - enforcedsince > 365; + +course_id | prereq_id | isenforced | enforcedsince +-----------+-----------+------------+--------------- + 5 | 1 | t | 2011-01-01 +``` + +#### Set and Bag Operations + +**SET operations** + +- UNION +- INTERSECT +- EXCEPT + +**BAG operations** + +- UNION ALL +- INTERSECT ALL +- EXCEPT ALL + +```SQL +(SELECT ... FROM ... WHERE ...) +UNION +(SELECT ... FROM ... WHERE ...) +``` + +*Note: Same as in relational algebra, the queries should be union-compatible* + +**EXAMPLE** + +Table `a1` with id values: `1,2,2,2,3,3` Table `a2` with id values: `2,3,3` + +```sql +-- set operation, returns 1,2,3 +select * from a1 union select * from a2 ; + +-- returns 2,3 +select * from a1 intersect select * from a2 ; + +-- returns 1 +select * from a1 except select * from a2 ; + +-- returns 1,2,2,2,2,3,3,3,3 -bag union +select * from a1 union all select * from a2 ; + +-- returns 2,3,3 -bag intersection +select * from a1 intersect all select * from a2 ; +``` + +**EXAMPLE 2** + +```SQL +-- Return full name of all bakers who star baker but never won a technical challenge +SELECT b.fullname +FROM bakers b, results r +WHERE b.baker = r.baker and r.result = 'star baker' +EXCEPT +SELECT b.fullname +FROM bakers b, technicals t +WHERE b.baker = t.baker and t.rank = 1; +``` + +### AGGREGATES + +Similar to the aggregates in bag relational algebra, you can find the aggregate for a specific column or combination of columns + +- Commonly used aggregates are: `min`, `max`, `avg`, `sum`, `count`, `stddev` +- An aggregate returns a single tuple (unless accompanied by other clauses like GROUP BY or FILTER) + +```SQL +-- Find total number of times ‘Kim-Joy’ won star baker. +SELECT count(*) as num_wins +FROM results +WHERE baker = 'Kim-Joy'; +``` + +**Note:** + +- `count(*)` counts the total number of tuples. +- `count(attribute)` counts the total number of values for a given attribute, disregarding the NULL values. +- `count(DISTINCT attribute)` counts the total number of distinct values for a given attribute, disregarding the NULL values. + +#### GROUP BY + +Instead of computing the aggregates for the whole query, it is possible to compute it for a group. + +- Group by multiple attributes by finding tuples that have the same values for the grouping attributes +- For each group, produce a single tuple containing grouping attributes and any agregates over the group. +- To return an attribute from a relation, you MUST include it in the grouping attributes. + +**Example** + +Find the total number of star baker wins for each baker. Return the full name and hometown of each baker. + +```SQL +SELECT b.baker, b.fullname, count(*) as numwins +FROM bakers b, results r +WHERE b.baker = r.baker and r.result = 'star baker' +GROUP BY b.baker, b.fullname; +``` + +#### GROUP BY - HAVING + +- Group by statement can be followed by an optional HAVING clause. +- You can write conditions to eliminate groups in the HAVING clause +- Aggregates over the groups. +- All other conditions should be put in the WHERE clause to reduce the size of the relation to be grouped + +**Example** + +Find all bakers who have used ‘chocolate’ or ‘ginger’ in the showstopper challenge at least two different episodes and won star baker at least twice. Return their fullname + +```SQL +SELECT b.baker, b.fullname +FROM bakers b, showstoppers ss, results r +WHERE + b.baker = ss.baker + and b.baker = r.baker + and r.result = 'star baker' + and (lower(ss.make) like '%ginger%' or lower(ss.make) like '%chocolate%') +GROUP BY + b.baker +HAVING + count(DISTINCT ss.episodeid) >= 2 + and count(DISTINCT r.episodeid) >= 2; +``` + +#### + +#### ORDER BY + +- You can order the tuples returned by the query with respect to one or more attributes. + +```sql +-- Return the students, order with respect to year (descending) and name (ascending). +SELECT * FROM episodes +ORDER BY viewers7day desc, id asc; +``` + +#### LIMIT + +- You can limit the number of tuples returned +- is the **last possible statement to add** +- makes the most sense when combined with an order by + +```sql +-- Find the top 3 bakers in terms of number of wins. Return their name +SELECT b.baker, b.fullname, count(*) as numwins +FROM bakers b , results r +WHERE + b.baker = r.baker + and r.result = 'star baker' +GROUP BY b.baker +ORDER BY numwins desc; +LIMIT 3; +``` + +--- + +# Lecture Notes: Advanced SQL Query Techniques + +#### Generated by ChatGPT 4-o from my insane rambling notes because I'm sick and can't be fucked + +--- + +## Introduction + +In this lecture, we'll explore advanced SQL query techniques using a practical example involving a database schema and specific querying requirements. We'll cover topics such as regular expressions in SQL, data type conversions, handling `NULL` values, debugging SQL queries, and ensuring compatibility across different SQL dialects. + +--- + +## Comprehensive SQL Concepts and Definitions for Future Assignments + +--- + +## Table of Contents + + 1. Understanding the Database Schema + 2. Basic SQL Statements + 3. Data Filtering Techniques + 4. Joining Tables + 5. Working with NULL Values + 6. Data Type Conversions and Casting + 7. Functions and Expressions + 8. Regular Expressions in SQL + 9. Extracting Numbers from Strings +10. Aggregate Functions and Grouping Data +11. Subqueries and Common Table Expressions (CTEs) +12. Sorting and Limiting Results +13. SQL Dialects and Compatibility +14. Error Handling and Debugging +15. Best Practices +16. Security Considerations +17. Conclusion +18. Appendices + - Execution Order of SQL Statements + - Common SQL Functions + - Additional Resources + +--- + +## 1. Understanding the Database Schema + +Before writing SQL queries, it's crucial to understand the database schema: + +- **Tables**: Structures that store data in rows and columns. +- **Columns**: Attributes or fields in a table. +- **Relationships**: How tables are related (e.g., primary keys, foreign keys). + +### Example Schema: + +- **bakers**: Stores baker information. + - Columns: `baker`, `fullname`, `age`, `occupation`, `hometown`. +- **episodes**: Contains episode details. + - Columns: `id`, `title`, `firstaired`, `viewers7day`, `signature`, `technical`, `showstopper`. +- **signatures**, **showstoppers**, **technicals**, **results**: Store challenge-specific data. + +--- + +## 2. Basic SQL Statements + +- `SELECT`: Retrieves data from a database. + - *Syntax*: `SELECT column1, column2 FROM table_name;` +- `FROM`: Specifies the table to query. +- `WHERE`: Filters records based on conditions. + - *Syntax*: `WHERE condition;` +- `ORDER BY`: Sorts the result set. + - *Syntax*: `ORDER BY column1 ASC|DESC;` + +--- + +## 3. Data Filtering Techniques + +### Pattern Matching: + +- `LIKE`: Case-sensitive pattern matching. + - *Syntax*: `WHERE column LIKE 'pattern%';` +- `ILIKE`: Case-insensitive pattern matching (PostgreSQL). + - *Syntax*: `WHERE column ILIKE 'pattern%';` + +### Using Wildcards: + +- `%`: Represents zero or more characters. +- `_`: Represents a single character. + +### Comparison Operators: + +- `=`, `!=`, `>`, `<`, `>=`, `<=` + +### Range and List Checks: + +- `BETWEEN`: Checks if a value is within a range. + - *Syntax*: `WHERE column BETWEEN value1 AND value2;` +- `IN`: Checks if a value matches any value in a list. + - *Syntax*: `WHERE column IN (value1, value2, ...);` + +--- + +## 4. Joining Tables + +- `JOIN`: Combines rows from two or more tables based on related columns. + +### Types of Joins: + +- `INNER JOIN`: Returns records with matching values in both tables. + - *Syntax*: `FROM table1 INNER JOIN table2 ON table1.column = table2.column;` +- `LEFT JOIN`: Returns all records from the left table and matched records from the right table. +- `RIGHT JOIN`: Returns all records from the right table and matched records from the left table. +- `FULL OUTER JOIN`: Returns all records when there is a match in either table. + +### Self-Join: + +- A table joined with itself. +- Useful for comparing rows within the same table. +- Requires table aliases. +- *Syntax*: `FROM table_name t1 JOIN table_name t2 ON t1.column = t2.column;` + +--- + +## 5. Working with NULL Values + +- `NULL` represents missing or unknown data. +- `IS NULL` and `IS NOT NULL`: Check for NULL values. + +### Handling NULLs: + +- `COALESCE()`: Returns the first non-NULL value in a list. + - *Syntax*: `COALESCE(value1, value2, ...)` + +### Example: + +```sql +SELECT COALESCE(middle_name, 'N/A') AS middle_name FROM persons; +``` + +--- + +## 6. Data Type Conversions and Casting + +- Ensures data types are compatible for operations. + +### Casting: + +- `CAST()`: Converts a value to a specified data type. + - *Syntax*: `CAST(expression AS data_type)` +- `::` Operator (PostgreSQL): Alternative casting syntax. + - *Syntax*: `expression::data_type` + +### Example: + +```sql +SELECT CAST('123' AS integer) AS number; +SELECT '123'::integer AS number; +``` + +--- + +## 7. Functions and Expressions + +### Mathematical Functions: + +- `ABS()`: Absolute value. +- `ROUND()`: Rounds a number to a specified number of decimal places. + - *Syntax*: `ROUND(number, decimals)` +- `CEILING()`/`FLOOR()`: Rounds up or down to the nearest integer. + +### String Functions: + +- `UPPER()`/`LOWER()`: Converts string case. +- `TRIM()`: Removes whitespace. +- `SUBSTRING()`: Extracts a substring. + - *Syntax*: `SUBSTRING(string FROM pattern)` + +### Date Functions: + +- `CURRENT_DATE`, `CURRENT_TIMESTAMP` +- `DATEADD()`, `DATEDIFF()` + +--- + +## 8. Regular Expressions in SQL + +- Allows complex pattern matching. + +### Syntax: + +- **PostgreSQL**: + - `~`: Case-sensitive match. + - `~*`: Case-insensitive match. +- **MySQL**: + - `REGEXP`: Pattern matching operator. + +### Example: + +```sql +-- Find rows where 'make' contains 'cake' as a whole word +SELECT * FROM showstoppers WHERE make ~* '\ycake\y'; +``` + +### Regex Components: + +- `^`: Start of string. +- `$`: End of string. +- `.`: Any single character. +- `*`: Zero or more occurrences. +- `+`: One or more occurrences. +- `[]`: Character class. +- `\d`: Digit. +- `\w`: Word character. +- `\s`: Whitespace. +- `\y`: Word boundary (PostgreSQL). + +--- + +## 9. Extracting Numbers from Strings + +- Useful for comparing numerical values embedded in text. + +### Using `SUBSTRING()` and Regular Expressions: + +```sql +-- Extract leading numbers from a string +SUBSTRING(column FROM '^\d+') +``` + +### Casting Extracted Strings: + +```sql +-- Convert extracted numbers to integer +CAST(SUBSTRING(column FROM '^\d+') AS integer) +``` + +### Example: + +```sql +SELECT + CAST(SUBSTRING(signature FROM '^\d+') AS integer) AS signature_number +FROM episodes; +``` + +--- + +## 10. Aggregate Functions and Grouping Data + +### Aggregate Functions: + +- `COUNT()`: Number of rows. +- `SUM()`: Total sum. +- `AVG()`: Average value. +- `MIN()`/`MAX()`: Minimum or maximum value. + +### Grouping Data: + +- `GROUP BY`: Groups rows sharing values. + - *Syntax*: `GROUP BY column1, column2` +- `HAVING`: Filters groups based on aggregate conditions. + - *Syntax*: `HAVING condition` + +### Conditional Aggregation: + +- `COUNT(*) FILTER (WHERE condition)`: Counts rows meeting a condition. + +### Example: + +```sql +SELECT + department, + COUNT(*) AS total_employees, + COUNT(*) FILTER (WHERE salary > 50000) AS high_earners +FROM employees +GROUP BY department; +``` + +--- + +## 11. Subqueries and Common Table Expressions (CTEs) + +### Subqueries: + +- Nested queries within a main query. +- *Syntax*: `SELECT ... FROM (SELECT ...) AS sub;` + +### Common Table Expressions (CTEs): + +- Temporary result set that can be referenced within the main query. +- *Syntax*: + +```sql +WITH cte_name AS ( + SELECT ... +) +SELECT ... +FROM cte_name; +``` + +### Example: + +```sql +WITH high_viewers AS ( + SELECT id, viewers7day FROM episodes WHERE viewers7day > 10 +) +SELECT * FROM high_viewers; +``` + +--- + +## 12. Sorting and Limiting Results + +### Ordering: + +- `ORDER BY`: Sorts results. + - *Syntax*: `ORDER BY column1 ASC|DESC, column2;` + +### Limiting: + +- `LIMIT`: Limits the number of rows returned. + - *Syntax*: `LIMIT number;` +- `FETCH FIRST`: Alternative to LIMIT. + - *Syntax*: `FETCH FIRST number ROWS ONLY;` + +### Example: + +```sql +SELECT * FROM episodes ORDER BY viewers7day DESC LIMIT 5; +``` + +--- + +## 13. SQL Dialects and Compatibility + +### Differences Across Databases: + +- **PostgreSQL**: + - Uses `ILIKE` for case-insensitive LIKE. + - Supports `~` and `~*` for regex. + - Allows `::` casting. +- **MySQL**: + - Uses `REGEXP` for regex. + - Does not support `ILIKE`; use `LOWER()` with `LIKE`. + +### Ensuring Compatibility: + +- **Check Documentation**: Refer to specific database manuals. +- **Avoid Proprietary Features**: Use standard SQL when possible. +- **Test Queries**: Validate in the target database environment. + +--- + +## 14. Error Handling and Debugging + +### Common Errors: + +- **Syntax Errors**: Misspelled commands, missing commas. +- **Type Mismatch**: Incompatible data types. +- **Undefined Functions**: Using functions not available in the SQL dialect. + +### Debugging Steps: + +1. **Read Error Messages Carefully**: They often indicate the issue. +2. **Check Syntax**: Ensure correct command usage. +3. **Validate Data Types**: Use casting if necessary. +4. **Simplify the Query**: Break it down to identify the problematic part. +5. **Use Comments**: Comment out sections to isolate errors. + +### Example Error and Resolution: + +```sql +-- Error: function round(double precision, integer) does not exist +-- Solution: Cast the number to numeric +SELECT ROUND(viewers7day::numeric, 2) FROM episodes; +``` + +--- + +## 15. Best Practices + +- **Use Aliases for Clarity**: + - Shorten table/column names. + - *Example*: `SELECT e.name FROM employees AS e;` +- **Filter Early**: + - Apply `WHERE` clauses before `GROUP BY` or `JOIN` to reduce data size. +- **Optimize Joins**: + - Ensure proper indexing on join columns. + - Use appropriate join types. +- **Handle NULLs Appropriately**: + - Be aware of NULL behavior in comparisons and functions. +- **Comment Your Code**: + - Use `--` for single-line and `/* ... */` for multi-line comments. +- **Consistent Formatting**: + - Write SQL keywords in uppercase. + - Use indentation for readability. + +--- + +## 16. Security Considerations + +- **Prevent SQL Injection**: + - Use parameterized queries. + - Avoid concatenating user input into SQL statements. signs\*\* + +You can also return a table of rows: + +> - Return each tuple with RETURN NEXT and finish with RETURN +> - As these return a table, they are called in the FROM clause. See the loop section below for examples. + +### Handling SQL + +````SQL +CREATE FUNCTION sales_tax(subtotal real) RETURNS boolean AS +- **Limit Permissions**: + - Grant only necessary privileges to users. +- **Validate Input**: + - Sanitize user inputs. + - Use input validation to enforce data integrity. + +--- + +## 17. Conclusion + +Understanding these SQL concepts equips you to handle various data querying and manipulation tasks effectively. By mastering pattern matching, data type conversions, error handling, and other advanced techniques, you can write efficient and robust SQL queries for future assignments. + +--- + +## 18. Appendices + +### Execution Order of SQL Statements + +1. **FROM**: Data source specification. +2. **JOIN**: Combining tables. +3. **WHERE**: Row-level filtering. +4. **GROUP BY**: Grouping rows. +5. **HAVING**: Group-level filtering. +6. **SELECT**: Column selection. +7. **ORDER BY**: Sorting results. +8. **LIMIT**: Limiting output. + +--- + +### Common SQL Functions + +- **String Functions**: `CONCAT()`, `LENGTH()`, `REPLACE()` +- **Date Functions**: `NOW()`, `DATE_PART()`, `AGE()` +- **Numeric Functions**: `POWER()`, `MOD()`, `SQRT()` +- **Conversion Functions**: `TO_CHAR()`, `TO_DATE()` + +--- + +### Additional Resources + +- **PostgreSQL Documentation**: [postgresql.org/docs](https://www.postgresql.org/docs/) +- **MySQL Documentation**: [dev.mysql.com/doc](https://dev.mysql.com/doc/) +- **Regular Expressions Reference**: [regular-expressions.info](https://www.regular-expressions.info/) +- **SQL Tutorial**: [w3schools.com/sql](https://www.w3schools.com/sql/) + +--- + +## Practice Examples + +### Example 1: Using Regular Expressions + +```sql +-- Find episodes where the signature starts with two digits and a space +SELECT id, title, signature +FROM episodes +WHERE signature ~ '^\d{2} .+'; +```` + +### Example 2: Extracting Numbers and Comparing + +```sql +-- Select episodes where the signature number is greater than the technical number +SELECT id, title +FROM episodes +WHERE + CAST(SUBSTRING(signature FROM '^\d+') AS integer) > + CAST(SUBSTRING(technical FROM '^\d+') AS integer); +``` + +### Example 3: Handling NULLs with COALESCE + +```sql +-- Replace NULL hometowns with 'Unknown' +SELECT fullname, COALESCE(hometown, 'Unknown') AS hometown +FROM bakers; +``` + +### Example 4: Rounding and Division + +```sql +-- Calculate normalized viewers and round to two decimal places +SELECT + id, + ROUND((viewers7day / 100)::numeric, 2) AS viewers_normalized +FROM episodes; +``` + +--- + +## Final SQL Query Example with Explanation + +### Question + +Return the maximum absolute difference in `viewers7day` value between two consecutive episodes. Name the returned attribute `maxviewerdiff`. + +### SQL Query + +```sql +SELECT MAX(ABS(e1.viewers7day - e2.viewers7day)) AS maxviewerdiff +FROM episodes e1 +JOIN episodes e2 ON e1.id = e2.id - 1; +``` + +### Explanation + +- **Objective**: Find the largest absolute difference in `viewers7day` between consecutive episodes. +- **Approach**: + - **Self-Join**: Join the `episodes` table to itself to compare consecutive episodes. + - `e1` represents the current episode. + - `e2` represents the next episode. + - **Join Condition**: `e1.id = e2.id - 1` ensures pairing of consecutive episodes. + - **Calculate Difference**: `ABS(e1.viewers7day - e2.viewers7day)` computes the absolute difference in viewers. + - **Aggregate Function**: `MAX()` retrieves the maximum difference. +- **Result**: The query returns a single value `maxviewerdiff`, representing the largest viewer drop or increase between two consecutive episodes. + +--- + +## Closing Thoughts + +By integrating these advanced topics into your SQL knowledge base, you enhance your ability to write complex queries, troubleshoot issues, and ensure your code is both efficient and secure. Practice regularly with different scenarios to solidify these concepts. + +--- + +## Procedural Programming + +To enable the use of SQL for costly queries, while making it possible to write code/procedures on top of it, databases support a number of options. + +- Server-side +- Client-side + +#### Server-side + +- Languages make it possible to define procedures, functions, and triggers +- These programs are compiled and stored in the database server +- They can also be called by queries + +#### Client-side + +- Languages allow programs to integrate querying of the database with a procedural language +- Coding in a host language with db hooks (C, C++, Java, Python, etc.) using the data structures of these languages +- Coding in frameworks with their own data models (Java, Python, etc) with similar db hooks as in above. + +### Programming in SQL + +**All programming paradigms support:** + +- methods to execute queries/update statements +- executing any SQL statement, catching the outcome, and interpreting the errors if any +- input values from variables into queries and outputting the values from queries into variables +- loop over query results (if multiple tuples) +- raise exceptions, which results in rollbacks of transactions +- store and reuse queries in the shape of cursors +- starting and committing transactions + +**Client-side programs also support:** + +- opening/closing connections +- allocating/releasing database resources for queries + +**Server-side language examples (Generally database-specific):** + +- pl/pgsql: a generic procedural language for postgresql +- pl/pyhton: a procedural language that is an extension of python + +**Client-side language examples:** + +- libpq: a C library for postgres which uses library calls specific to psql +- OCCI: Oracle library for C +- ECPG: embedded programming in SQL, based on the embedded programming standard with a postgresql specific pre-compiler, an the standard C compiler + +**Frameworks** +based on specific design principles for developing database backed applications +examples: + +- Object-relational-mapping used by Rails, Hibernate, Django, WebObjects, .NET (different frameworks have different models) +- Note that the frameworks can be built on top of other languages (such as Java + JDBC) + +### pl/pgsql + +- supports the same data types as the database +- programs and functions can be compiled and used directly by the db server +- main pl/pgsql block is in this form: + +```pl/pgsql +[ <