Introduction to Relational Databases and SQL

Emmanuel BRUNO

emmanuel.bruno@univ-tln.fr

Université de Toulon

LIS UMR CNRS 7020

2024-11-01

Overview

Topics:
- What is a Database?
- Introduction to Relational Databases
- Key Concepts: Tables, Rows, and Columns
- Advantages of Relational Databases
- Introduction to SQL
- Various RDBMS Following the SQL Standard
- Relational Algebra and SQL
- Basic SQL Queries: SELECT, INSERT, UPDATE, DELETE
- SQL Data Types in PostgreSQL
- Primary Key and Integrity Constraints
- Foreign Keys and Cascading Actions
- Joins: INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN
- Using GROUP BY in SQL
- Practical Examples in E-commerce

What is a Database?

Definition:
- A database is an organized collection of data, generally stored and accessed electronically from a computer system.
Purpose:
- Databases are used to store, manage, and retrieve data efficiently.
Key Features:
- Data Integrity: Ensures accuracy and consistency of data.
- Data Security: Protects data from unauthorized access.
- Scalability: Handles increasing amounts of data and users.
- Backup and Recovery: Provides mechanisms to restore data in case of failure.
Examples:
- Customer Information: Storing customer names, addresses, and contact details.
- Product Details: Managing product names, descriptions, prices, and stock levels.
- Order History: Tracking customer orders, statuses, and transaction details.

What is a Relational Database?

Relational Model:
- Concept: Proposed by E.F. Codd in 1970, the relational model organizes data into one or more tables (or “relations”) of columns and rows, with a unique key identifying each row.
- Relations vs. Tables:
  - Relation: A theoretical concept representing an entity type, defined by a set of attributes.
  - Table: The physical implementation of a relation in a database, consisting of rows (tuples) and columns.
- Attributes vs. Columns:
  - Attribute: A property or characteristic of an entity type in the relational model.
  - Column: The physical representation of an attribute in a table, defining the data type and constraints for the values in that column.
Relational Database:
- A type of database that stores data in tables.
- tuples (rows) are identified by a primary key.
- Tables can be related to each other through keys.

Key Concepts: Tables, Rows, and Columns

Table:
- A table is a collection of related data entries.
- Example: Customers, Products, Orders
Row:
- A row represents a single record in a table.
- Example: A single customer’s information
Column:
- A column represents a specific attribute of a record.
- Example: Customer Name, Email, Phone Number

Example Table: customers

customer_id	name	email	phone_number	country
1	John Doe	john@example.com	123-456-7890	USA
2	Jane Smith	jane@example.com	987-654-3210	USA
3	Pierre Dupont	pierre@example.fr	555-987-6543	France

Example Table: orders

order_id	order_date	customer_id
101	2023-01-15	1
102	2023-01-16	2
103	2023-01-17	1

Example Table: Customers

CustomerID	Name	Email	Phone Number	Country
1	John Doe	john@example.com	123-456-7890	USA
2	Jane Smith	jane@example.com	987-654-3210	USA
3	Pierre Dupont	pierre@example.fr	555-987-6543	France

Advantages of Relational Databases

Organized Structure:
- Data is stored in a structured format using rows and columns.
Ease of Querying:
- Simple to query and retrieve data using SQL.
Support for Complex Queries:
- Capable of handling complex queries and transactions.
Data Integrity:
- Ensures accuracy and consistency of data through constraints and normalization.
Scalability:
- Can handle large amounts of data and concurrent users efficiently.
Security:
- Provides robust security features to protect data, including access controls and encryption.

Introduction to SQL

SQL (Structured Query Language):
- A standard language for accessing and manipulating relational databases.
- Developed in the 1970s by IBM and standardized by ANSI and ISO.
- The latest version is SQL:2016, which includes enhancements for JSON support, polymorphic table functions, and more.
Purpose:
- Used to perform tasks such as querying data, updating records, and managing database structures.
- Supports Data Definition Language (DDL) for defining database schemas.
- Supports Data Manipulation Language (DML) for inserting, updating, deleting, and retrieving data.
- Supports Data Control Language (DCL) for controlling access to data.
- Supports Transaction Control Language (TCL) for managing transactions.
Key Features:
- Declarative Language: Allows users to specify what data to retrieve rather than how to retrieve it.
- Portability: SQL can be used across different database systems with minimal changes.
- Extensibility: Supports extensions and procedural languages like PL/SQL and T-SQL.

Various RDBMS Following the SQL Standard

Relational Database Management Systems (RDBMS):
- Systems used to manage relational databases.
- They follow the SQL (Structured Query Language) standard for database operations, but there can be differences or extensions in how each RDBMS implements the standard.

RDBMS	Open Source	Key Features	Ideal Use Cases
PostgreSQL	Yes	Robustness, extensibility, advanced data types	Complex applications, data warehousing
MySQL	Yes	Speed, reliability, ease of use	Web applications, e-commerce
SQLite	Yes	Lightweight, serverless, self-contained	Embedded applications, small projects
Microsoft SQL Server	No	Integration with Microsoft products, enterprise features	Enterprise applications, business intelligence
Oracle Database	No	Scalability, advanced features	Large enterprises, mission-critical applications

Relational Algebra and SQL

Relational Algebra:
- A procedural query language that works on the relational model.
- Provides a set of operations to manipulate relations (tables).
- Fundamental operations include:
  - Selection (σ): Selects rows that satisfy a given predicate.
  - Projection (π): Selects specific columns from a table.
  - Union (∪): Combines the results of two queries.
  - Set Difference (−): Returns rows from one query that are not in another.
  - Cartesian Product (×): Combines all rows from two tables.
  - Join (⨝): Combines rows from two tables based on a related column.

SQL (Structured Query Language):
- A declarative query language used to interact with relational databases.
- SQL operations are based on relational algebra but are more user-friendly.
- Key SQL operations include:
  - SELECT: Retrieves data from one or more tables.
  - INSERT: Adds new records to a table.
  - UPDATE: Modifies existing records in a table.
  - DELETE: Removes records from a table.
  - JOIN: Combines rows from two or more tables based on a related column.

SELECT

Used to retrieve data from a database.
Based on relational algebra operations such as selection (σ) and projection (π).
```
SELECT customer_id, name, email FROM customers
WHERE country = 'USA';
```
Relational Algebra Equivalent:
- Projection (π): Selecting specific columns (CustomerID, Name, Email).
- Selection (σ): Filtering rows based on conditions (Country = ‘USA’).
Result:

customer_id name email

1 John Doe john@example.com

2 Jane Smith jane@example.com
Note: The WHERE clause is optional and is used to filter the rows based on specified conditions.

customer_id	name	email
1	John Doe	john@example.com
2	Jane Smith	jane@example.com

INSERT

Used to add new records to a table.

Example:

INSERT INTO customers (customer_id, name, email, phone_number, country)
VALUES (4, 'Alice Johnson', 'alice@example.com', '555-123-4567', 'Canada');

Result:
- A new record is added to the customers table with the specified values.

UPDATE

Used to modify existing records in a table.

Example:

UPDATE customers
SET email = 'john.doe@example.com'
WHERE customer_id = 1;

DELETE

Used to remove records from a table.

Example:

DELETE FROM customers
WHERE customer_id = 2;

SQL Data Types in PostgreSQL

SQL Data Types:
- Data types define the kind of data that can be stored in a table column.
- Choosing the correct data type is crucial for data integrity and performance.
Numeric Types
- INTEGER: Whole numbers.
- DECIMAL: Fixed-point numbers.
Character Types
- VARCHAR: Variable-length character strings.
- TEXT: Variable-length character strings with no length limit.
Date/Time Types
- DATE: Calendar date (year, month, day).
- TIMESTAMP: Date and time (without time zone).
Boolean Type
- BOOLEAN: Logical Boolean (true/false).
Array Type
- ARRAY: Array of values.

Example Table with Various Data Types

CREATE TABLE example_table (
  product_id SERIAL PRIMARY KEY,     -- Numeric Type: INTEGER (auto-incremented)
  price DECIMAL(10, 2),              -- Numeric Type: DECIMAL
  name VARCHAR(100),                 -- Character Type: VARCHAR
  description TEXT,                  -- Character Type: TEXT
  order_date DATE,                   -- Date/Time Type: DATE
  event_time TIMESTAMP,              -- Date/Time Type: TIMESTAMP
  is_active BOOLEAN,                 -- Boolean Type: BOOLEAN
  product_ids INTEGER[]              -- Array Type: INTEGER[]
);

Primary Key

In relational algebra, a primary key is a unique identifier for a tuple (row) in a relation (table). It ensures that no two tuples have the same value for the primary key attribute(s), thereby maintaining the integrity and uniqueness of the data within the relation.

Key characteristics of a primary key:

Uniqueness: Each value of the primary key must be unique across the relation.
Non-nullability: Primary key attributes cannot contain NULL values.
Minimality: The primary key must be minimal, meaning no subset of the primary key attributes can uniquely identify a tuple.
Composite Key: A primary key can be composed of multiple attributes, known as a composite key.

Declaring a Table with a Primary Key

CREATE TABLE orders (
  order_id INT,
  product_id INT,
  quantity INT,
  PRIMARY KEY (order_id)
);

Declaring a table with an Auto-generated Primary Key

A primary key can be auto-generated by the database system.
In PostgreSQL, you can use the SERIAL data type to auto-generate unique IDs for the primary key.

The SERIAL type automatically increments the ID for each new record.

CREATE TABLE customers (
  customer_id SERIAL PRIMARY KEY,
  name VARCHAR(100),
  email VARCHAR(100),
  phone_number VARCHAR(15),
  country VARCHAR(50)
);

Integrity Constraints

Integrity constraints are rules that ensure the accuracy and consistency of data in a database.
Types of integrity constraints:
- Primary Key Constraint: Ensures that each record has a unique identifier.
- Foreign Key Constraint: Ensures that a value in one table matches a value in another table, creating a relationship between the tables.
- Unique Constraint: Ensures that all values in a column are unique.
- Not Null Constraint: Ensures that a column cannot have NULL values.
- Check Constraint: Ensures that all values in a column satisfy a specific condition.
```
CREATE TABLE Employees (
  EmployeeID SERIAL PRIMARY KEY,
  Name VARCHAR(100) NOT NULL,
  Email VARCHAR(100) UNIQUE,
  Salary DECIMAL CHECK (Salary > 0)
);
```

ALTER and DROP

ALTER:
- Used to modify an existing database object, such as a table.
- Example:
```
ALTER TABLE products
ADD COLUMN stock INT;
```

DROP:
- Used to delete an existing database object, such as a table.
- Example:
```
DROP TABLE products;
```

Foreign Keys

A foreign key is a field (or collection of fields) in one table that uniquely identifies a row in another table.
It creates a relationship between two tables, ensuring referential integrity.
Key characteristics:
- Referential Integrity: Ensures that a value in the foreign key column(s) must match a value in the primary key column(s) of the referenced table.

    erDiagram
        CUSTOMERS {
            int customer_id PK
            varchar name
            varchar email
            varchar phone_number
        }
        ORDERS {
            int order_id PK
            date order_date
            int customer_id FK
        }
        CUSTOMERS ||--o{ ORDERS : has

    CREATE TABLE orders (
      order_id SERIAL PRIMARY KEY,
      order_date DATE,
      customer_id INT,
      FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
    );

Cascading Actions

Cascading actions can be defined to automatically update or delete related rows when a referenced row is updated or deleted.
Common cascading actions:
- ON DELETE CASCADE: Automatically deletes related rows in the child table when a row in the parent table is deleted.
- ON UPDATE CASCADE: Automatically updates related rows in the child table when a row in the parent table is updated.
```
CREATE TABLE Orders (
  OrderID SERIAL PRIMARY KEY,
  OrderDate DATE,
  CustomerID INT,
  FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
    ON DELETE CASCADE
    ON UPDATE CASCADE
);
```

Joins

JOIN:
- Used to combine rows from two or more tables based on a related column.
- Types of Joins:
  - INNER JOIN: Returns records that have matching values in both tables.
  - LEFT JOIN: Returns all records from the left table, and the matched records from the right table.
  - RIGHT JOIN: Returns all records from the right table, and the matched records from the left table.
  - FULL JOIN: Returns all records when there is a match in either left or right table.

Example Tables

CustomerID	Name	Email	Phone Number	Country
1	John Doe	john@example.com	123-456-7890	USA
2	Jane Smith	jane@example.com	987-654-3210	USA
3	Pierre Dupont	pierre@example.fr	555-987-6543	France

order_id	order_date	customer_id
101	2023-01-15	1
102	2023-01-16	2
103	2023-01-17	1

-- Create Customers Table
CREATE TABLE customers (
  customer_id SERIAL PRIMARY KEY,
  name VARCHAR(100),
  email VARCHAR(100),
  phone_number VARCHAR(20),
  country VARCHAR(50)
);

-- Insert Data into Customers Table
INSERT INTO customers (name, email, phone_number, country) VALUES
('John Doe', 'john@example.com', '123-456-7890', 'USA'),
('Jane Smith', 'jane@example.com', '987-654-3210', 'USA'),
('Pierre Dupont', 'pierre@example.fr', '555-987-6543', 'France');

-- Create Orders Table
CREATE TABLE orders (
  order_id SERIAL PRIMARY KEY,
  order_date DATE,
  customer_id INT,
  FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
    ON DELETE CASCADE
    ON UPDATE CASCADE
);

-- Insert Data into Orders Table
INSERT INTO orders (order_date, customer_id) VALUES
('2023-01-15', 1),
('2023-01-16', 2),
('2023-01-17', 1);

Example ER Diagram

erDiagram
    CUSTOMERS ||--o{ ORDERS : has
    CUSTOMERS {
        int customer_id PK
        varchar name
        varchar email
        varchar phone_number
        varchar country
    }
    ORDERS {
        int order_id PK
        date order_date
        int customer_id FK
    }

Example of an INNER JOIN

INNER JOIN:
- Returns records that have matching values in both tables.

SQL Query:

SELECT customers.customer_id, customers.name, orders.order_date
FROM customers
INNER JOIN orders ON customers.customer_id = orders.customer_id;

Result:

customer_id name order_date

1 John Doe 2023-01-15

2 Jane Smith 2023-01-16

1 John Doe 2023-01-17

customer_id	name	order_date
1	John Doe	2023-01-15
2	Jane Smith	2023-01-16
1	John Doe	2023-01-17

Example of a LEFT JOIN

LEFT JOIN:
- Returns all records from the left table, and the matched records from the right table.

SQL Query:

SELECT customers.customer_id, customers.name, orders.order_date
FROM customers
LEFT JOIN orders ON customers.customer_id = orders.customer_id;

Result:

customer_id name order_date

1 John Doe 2023-01-15

2 Jane Smith 2023-01-16

1 John Doe 2023-01-17

3 Pierre Dupont NULL

customer_id	name	order_date
1	John Doe	2023-01-15
2	Jane Smith	2023-01-16
1	John Doe	2023-01-17
3	Pierre Dupont	NULL

Example of a RIGHT JOIN

RIGHT JOIN:
- Returns all records from the right table, and the matched records from the left table.

SQL Query:

SELECT customers.customer_id, customers.name, orders.order_date
FROM customers
RIGHT JOIN orders ON customers.customer_id = orders.customer_id;

Result:

customer_id name order_date

1 John Doe 2023-01-15

2 Jane Smith 2023-01-16

1 John Doe 2023-01-17

customer_id	name	order_date
1	John Doe	2023-01-15
2	Jane Smith	2023-01-16
1	John Doe	2023-01-17

Example of a FULL JOIN

FULL JOIN:
- Returns all records when there is a match in either left or right table.

SQL Query:

SELECT customers.customer_id, customers.name, orders.order_date
FROM customers
FULL JOIN orders ON customers.customer_id = orders.customer_id;

Result:

customer_id name order_date

1 John Doe 2023-01-15

2 Jane Smith 2023-01-16

1 John Doe 2023-01-17

3 Pierre Dupont NULL

customer_id	name	order_date
1	John Doe	2023-01-15
2	Jane Smith	2023-01-16
1	John Doe	2023-01-17
3	Pierre Dupont	NULL

Using GROUP BY in SQL

GROUP BY Clause:
- The GROUP BY clause is used to group rows that have the same values in specified columns into summary rows.
- It is often used with aggregate functions (COUNT, SUM, AVG, MAX, MIN) to perform calculations on each group of rows.

Group By Syntax

SELECT column1, aggregate_function(column2)
FROM table_name
WHERE condition
GROUP BY column1;

Example: Grouping and Counting Orders by Customer

Scenario:
- You want to find out how many orders each customer has placed.

SQL Query:

SELECT customer_id, COUNT(order_id) AS order_count
FROM orders
GROUP BY customer_id;

Result:

customer_id order_count

2 1

1 2

customer_id	order_count
2	1
1	2

Example: Calculating Total Sales by Product

Scenario:
- You want to calculate the total sales amount for each product.

SQL Query:

SELECT product_id, SUM(quantity * price) AS total_sales
FROM order_details
GROUP BY product_id;

Result:

product_id total_sales

101 4999.95

102 2999.94

103 1999.96

product_id	total_sales
101	4999.95
102	2999.94
103	1999.96

Practical Examples in E-commerce

Example Tables

Customers Table

customer_id	name	email	phone_number	country
1	John Doe	john@example.com	123-456-7890	USA
2	Jane Smith	jane@example.com	987-654-3210	USA
3	Pierre Dupont	pierre@example.fr	555-987-6543	France

Products Table

product_id	product_name	price
101	Laptop	999.99
102	Smartphone	499.99

Orders Table

order_id	customer_id	product_id	quantity	status
1001	1	101	2	Pending
1002	2	102	1	Shipped

Categories Table

category_id	category_name
1	Electronics
2	Furniture

ProductCategories Table

product_id	category_id
101	1
102	1

Diagram

erDiagram
    CUSTOMERS {
        int customer_id PK
        varchar name
        varchar email
        varchar phone_number
        varchar country
    }
    PRODUCTS {
        int product_id PK
        varchar product_name
        decimal price
    }
    ORDERS {
        int order_id PK
        int customer_id FK
        int product_id FK
        int quantity
        varchar status
    }
    CATEGORIES {
        int category_id PK
        varchar category_name
    }
    PRODUCTCATEGORIES {
        int product_id FK
        int category_id FK
    }
    CUSTOMERS ||--o{ ORDERS : places
    PRODUCTS ||--o{ ORDERS : contains
    PRODUCTS ||--o{ PRODUCTCATEGORIES : belongs_to
    CATEGORIES ||--o{ PRODUCTCATEGORIES : includes

DDL Queries to Create and Populate the Database

-- Create Customers Table
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100),
    phone_number VARCHAR(15),
    country VARCHAR(50)
);

-- Create Products Table
CREATE TABLE products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(100),
    price DECIMAL(10, 2)
);

-- Create Orders Table
CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT REFERENCES customers(customer_id),
    product_id INT REFERENCES products(product_id),
    quantity INT,
    status VARCHAR(50)
);

-- Create Categories Table
CREATE TABLE categories (
    category_id SERIAL PRIMARY KEY,
    category_name VARCHAR(100)
);

-- Create ProductCategories Table
CREATE TABLE product_categories (
    product_id INT REFERENCES products(product_id),
    category_id INT REFERENCES categories(category_id),
    PRIMARY KEY (product_id, category_id)
);

-- Insert Data into Customers Table
INSERT INTO customers (customer_id,name, email, phone_number, country) VALUES
(1,'John Doe', 'john@example.com', '123-456-7890', 'USA'),
(2,'Jane Smith', 'jane@example.com', '987-654-3210', 'USA'),
(3,'Pierre Dupont', 'pierre@example.fr', '555-987-6543', 'France');

-- Insert Data into Products Table
INSERT INTO products (product_id,product_name, price) VALUES
(101,'Laptop', 999.99),
(102,'Smartphone', 499.99);

-- Insert Data into Orders Table
INSERT INTO orders (customer_id, product_id, quantity, status) VALUES
(1, 101, 2, 'Pending'),
(2, 102, 1, 'Shipped');

-- Insert Data into Categories Table
INSERT INTO categories (category_name) VALUES
('Electronics'),
('Furniture');

-- Insert Data into ProductCategories Table
INSERT INTO product_categories (product_id, category_id) VALUES
(101, 1),
(102, 1);

Useful Queries

Querying Product Information:
- Example:
```
SELECT product_id, product_name, price FROM products;
```
- Result:
  
  product_id product_name price
  
  101 Laptop 999.99
  
  102 Smartphone 499.99

Adding a New Order:

Example:

INSERT INTO orders (customer_id, product_id, quantity, status)
VALUES (1, 101, 2, 'Pending');

Updating Order Status:

Example:

UPDATE orders
SET status = 'Shipped'
WHERE order_id = 1;

Deleting a Customer Record:

Example:

DELETE FROM customers
WHERE customer_id = 3;

Querying Products by Category:

Example:

SELECT p.product_id, p.product_name, c.category_name
FROM products p
JOIN product_categories pc ON p.product_id = pc.product_id
JOIN categories c ON pc.category_id = c.category_id;

Result:

product_id product_name category_name

101 Laptop Electronics

102 Smartphone Electronics

product_id	product_name	category_name
101	Laptop	Electronics
102	Smartphone	Electronics

Summary

Key Points:
- Databases are essential for storing and managing data.
- Relational databases organize data into tables, rows, and columns.
- SQL is the standard language for interacting with relational databases.
- Basic SQL queries include SELECT, INSERT, UPDATE, and DELETE.
- IDs, joins, and foreign keys are crucial for establishing relationships between tables.
- DDL commands are used to define and manage database structures.
- PostgreSQL and MySQL are popular relational database management systems.

Final Thoughts

Quick Bootstrap:
- This lecture is a quick bootstrap to get you started with relational databases and SQL.
- It is not a full course, so further study and practice are recommended to gain a deeper understanding.
Importance of Databases:
- Understanding databases and SQL is crucial for managing data in e-commerce.
- Databases ensure data integrity, security, and efficient retrieval, which are essential for any data-driven application.
- Keep practicing and exploring to become proficient in SQL and database management.