Joins: INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN
Using GROUP BY in SQL
Practical Examples in E-commerce
What is a Database?
Definition:
A database is an organized collection of data, generally stored and accessed electronically from a computer system.
Purpose:
Databases are used to store, manage, and retrieve data efficiently.
Key Features:
Data Integrity: Ensures accuracy and consistency of data.
Data Security: Protects data from unauthorized access.
Scalability: Handles increasing amounts of data and users.
Backup and Recovery: Provides mechanisms to restore data in case of failure.
Examples:
Customer Information: Storing customer names, addresses, and contact details.
Product Details: Managing product names, descriptions, prices, and stock levels.
Order History: Tracking customer orders, statuses, and transaction details.
What is a Relational Database?
Relational Model:
Concept: Proposed by E.F. Codd in 1970, the relational model organizes data into one or more tables (or “relations”) of columns and rows, with a unique key identifying each row.
Relations vs. Tables:
Relation: A theoretical concept representing an entity type, defined by a set of attributes.
Table: The physical implementation of a relation in a database, consisting of rows (tuples) and columns.
Attributes vs. Columns:
Attribute: A property or characteristic of an entity type in the relational model.
Column: The physical representation of an attribute in a table, defining the data type and constraints for the values in that column.
Relational Database:
A type of database that stores data in tables.
tuples (rows) are identified by a primary key.
Tables can be related to each other through keys.
Key Concepts: Tables, Rows, and Columns
Table:
A table is a collection of related data entries.
Example: Customers, Products, Orders
Row:
A row represents a single record in a table.
Example: A single customer’s information
Column:
A column represents a specific attribute of a record.
Example: Customer Name, Email, Phone Number
Example Table: customers
customer_id
name
email
phone_number
country
1
John Doe
john@example.com
123-456-7890
USA
2
Jane Smith
jane@example.com
987-654-3210
USA
3
Pierre Dupont
pierre@example.fr
555-987-6543
France
Example Table: orders
order_id
order_date
customer_id
101
2023-01-15
1
102
2023-01-16
2
103
2023-01-17
1
Example Table: Customers
CustomerID
Name
Email
Phone Number
Country
1
John Doe
john@example.com
123-456-7890
USA
2
Jane Smith
jane@example.com
987-654-3210
USA
3
Pierre Dupont
pierre@example.fr
555-987-6543
France
Advantages of Relational Databases
Organized Structure:
Data is stored in a structured format using rows and columns.
Ease of Querying:
Simple to query and retrieve data using SQL.
Support for Complex Queries:
Capable of handling complex queries and transactions.
Data Integrity:
Ensures accuracy and consistency of data through constraints and normalization.
Scalability:
Can handle large amounts of data and concurrent users efficiently.
Security:
Provides robust security features to protect data, including access controls and encryption.
Introduction to SQL
SQL (Structured Query Language):
A standard language for accessing and manipulating relational databases.
Developed in the 1970s by IBM and standardized by ANSI and ISO.
The latest version is SQL:2016, which includes enhancements for JSON support, polymorphic table functions, and more.
Purpose:
Used to perform tasks such as querying data, updating records, and managing database structures.
Supports Data Definition Language (DDL) for defining database schemas.
Supports Data Manipulation Language (DML) for inserting, updating, deleting, and retrieving data.
Supports Data Control Language (DCL) for controlling access to data.
Supports Transaction Control Language (TCL) for managing transactions.
Key Features:
Declarative Language: Allows users to specify what data to retrieve rather than how to retrieve it.
Portability: SQL can be used across different database systems with minimal changes.
Extensibility: Supports extensions and procedural languages like PL/SQL and T-SQL.
Various RDBMS Following the SQL Standard
Relational Database Management Systems (RDBMS):
Systems used to manage relational databases.
They follow the SQL (Structured Query Language) standard for database operations, but there can be differences or extensions in how each RDBMS implements the standard.
RDBMS
Open Source
Key Features
Ideal Use Cases
PostgreSQL
Yes
Robustness, extensibility, advanced data types
Complex applications, data warehousing
MySQL
Yes
Speed, reliability, ease of use
Web applications, e-commerce
SQLite
Yes
Lightweight, serverless, self-contained
Embedded applications, small projects
Microsoft SQL Server
No
Integration with Microsoft products, enterprise features
Enterprise applications, business intelligence
Oracle Database
No
Scalability, advanced features
Large enterprises, mission-critical applications
Relational Algebra and SQL
Relational Algebra:
A procedural query language that works on the relational model.
Provides a set of operations to manipulate relations (tables).
Fundamental operations include:
Selection (σ): Selects rows that satisfy a given predicate.
Projection (π): Selects specific columns from a table.
Union (∪): Combines the results of two queries.
Set Difference (−): Returns rows from one query that are not in another.
Cartesian Product (×): Combines all rows from two tables.
Join (⨝): Combines rows from two tables based on a related column.
SQL (Structured Query Language):
A declarative query language used to interact with relational databases.
SQL operations are based on relational algebra but are more user-friendly.
Key SQL operations include:
SELECT: Retrieves data from one or more tables.
INSERT: Adds new records to a table.
UPDATE: Modifies existing records in a table.
DELETE: Removes records from a table.
JOIN: Combines rows from two or more tables based on a related column.
SELECT
Used to retrieve data from a database.
Based on relational algebra operations such as selection (σ) and projection (π).
SELECT customer_id, name, email FROM customersWHERE country ='USA';
Relational Algebra Equivalent:
Projection (π): Selecting specific columns (CustomerID, Name, Email).
Selection (σ): Filtering rows based on conditions (Country = ‘USA’).
Result:
customer_id
name
email
1
John Doe
john@example.com
2
Jane Smith
jane@example.com
Note: The WHERE clause is optional and is used to filter the rows based on specified conditions.
Data types define the kind of data that can be stored in a table column.
Choosing the correct data type is crucial for data integrity and performance.
Numeric Types
INTEGER: Whole numbers.
DECIMAL: Fixed-point numbers.
Character Types
VARCHAR: Variable-length character strings.
TEXT: Variable-length character strings with no length limit.
Date/Time Types
DATE: Calendar date (year, month, day).
TIMESTAMP: Date and time (without time zone).
Boolean Type
BOOLEAN: Logical Boolean (true/false).
Array Type
ARRAY: Array of values.
Example Table with Various Data Types
CREATETABLE example_table ( product_id SERIAL PRIMARYKEY, -- Numeric Type: INTEGER (auto-incremented) price DECIMAL(10, 2), -- Numeric Type: DECIMAL name VARCHAR(100), -- Character Type: VARCHAR description TEXT, -- Character Type: TEXT order_date DATE, -- Date/Time Type: DATE event_time TIMESTAMP, -- Date/Time Type: TIMESTAMP is_active BOOLEAN, -- Boolean Type: BOOLEAN product_ids INTEGER[] -- Array Type: INTEGER[]);
Primary Key
In relational algebra, a primary key is a unique identifier for a tuple (row) in a relation (table). It ensures that no two tuples have the same value for the primary key attribute(s), thereby maintaining the integrity and uniqueness of the data within the relation.
Key characteristics of a primary key:
Uniqueness: Each value of the primary key must be unique across the relation.
Declaring a table with an Auto-generated Primary Key
A primary key can be auto-generated by the database system.
In PostgreSQL, you can use the SERIAL data type to auto-generate unique IDs for the primary key.
The SERIAL type automatically increments the ID for each new record.
CREATETABLE customers ( customer_id SERIAL PRIMARYKEY, name VARCHAR(100), email VARCHAR(100), phone_number VARCHAR(15), country VARCHAR(50));
Integrity Constraints
Integrity constraints are rules that ensure the accuracy and consistency of data in a database.
Types of integrity constraints:
Primary Key Constraint: Ensures that each record has a unique identifier.
Foreign Key Constraint: Ensures that a value in one table matches a value in another table, creating a relationship between the tables.
Unique Constraint: Ensures that all values in a column are unique.
Not Null Constraint: Ensures that a column cannot have NULL values.
Check Constraint: Ensures that all values in a column satisfy a specific condition.
CREATETABLE Employees ( EmployeeID SERIAL PRIMARYKEY, Name VARCHAR(100) NOTNULL, Email VARCHAR(100) UNIQUE, Salary DECIMALCHECK (Salary >0));
ALTER and DROP
ALTER:
Used to modify an existing database object, such as a table.
Example:
ALTERTABLE productsADDCOLUMN stock INT;
DROP:
Used to delete an existing database object, such as a table.
Example:
DROPTABLE products;
Foreign Keys
A foreign key is a field (or collection of fields) in one table that uniquely identifies a row in another table.
It creates a relationship between two tables, ensuring referential integrity.
Key characteristics:
Referential Integrity: Ensures that a value in the foreign key column(s) must match a value in the primary key column(s) of the referenced table.
erDiagram
CUSTOMERS {
int customer_id PK
varchar name
varchar email
varchar phone_number
}
ORDERS {
int order_id PK
date order_date
int customer_id FK
}
CUSTOMERS ||--o{ ORDERS : has
Used to combine rows from two or more tables based on a related column.
Types of Joins:
INNER JOIN: Returns records that have matching values in both tables.
LEFT JOIN: Returns all records from the left table, and the matched records from the right table.
RIGHT JOIN: Returns all records from the right table, and the matched records from the left table.
FULL JOIN: Returns all records when there is a match in either left or right table.
Example Tables
CustomerID
Name
Email
Phone Number
Country
1
John Doe
john@example.com
123-456-7890
USA
2
Jane Smith
jane@example.com
987-654-3210
USA
3
Pierre Dupont
pierre@example.fr
555-987-6543
France
order_id
order_date
customer_id
101
2023-01-15
1
102
2023-01-16
2
103
2023-01-17
1
-- Create Customers TableCREATETABLE customers ( customer_id SERIAL PRIMARYKEY, name VARCHAR(100), email VARCHAR(100), phone_number VARCHAR(20), country VARCHAR(50));-- Insert Data into Customers TableINSERTINTO customers (name, email, phone_number, country) VALUES('John Doe', 'john@example.com', '123-456-7890', 'USA'),('Jane Smith', 'jane@example.com', '987-654-3210', 'USA'),('Pierre Dupont', 'pierre@example.fr', '555-987-6543', 'France');-- Create Orders TableCREATETABLE orders ( order_id SERIAL PRIMARYKEY, order_date DATE, customer_id INT,FOREIGNKEY (customer_id) REFERENCES customers(customer_id)ONDELETECASCADEONUPDATECASCADE);-- Insert Data into Orders TableINSERTINTO orders (order_date, customer_id) VALUES('2023-01-15', 1),('2023-01-16', 2),('2023-01-17', 1);
Example ER Diagram
erDiagram
CUSTOMERS ||--o{ ORDERS : has
CUSTOMERS {
int customer_id PK
varchar name
varchar email
varchar phone_number
varchar country
}
ORDERS {
int order_id PK
date order_date
int customer_id FK
}
Example of an INNER JOIN
INNER JOIN:
Returns records that have matching values in both tables.
SQL Query:
SELECT customers.customer_id, customers.name, orders.order_dateFROM customersINNERJOIN orders ON customers.customer_id = orders.customer_id;
Result:
customer_id
name
order_date
1
John Doe
2023-01-15
2
Jane Smith
2023-01-16
1
John Doe
2023-01-17
Example of a LEFT JOIN
LEFT JOIN:
Returns all records from the left table, and the matched records from the right table.
SQL Query:
SELECT customers.customer_id, customers.name, orders.order_dateFROM customersLEFTJOIN orders ON customers.customer_id = orders.customer_id;
Result:
customer_id
name
order_date
1
John Doe
2023-01-15
2
Jane Smith
2023-01-16
1
John Doe
2023-01-17
3
Pierre Dupont
NULL
Example of a RIGHT JOIN
RIGHT JOIN:
Returns all records from the right table, and the matched records from the left table.
SQL Query:
SELECT customers.customer_id, customers.name, orders.order_dateFROM customersRIGHTJOIN orders ON customers.customer_id = orders.customer_id;
Result:
customer_id
name
order_date
1
John Doe
2023-01-15
2
Jane Smith
2023-01-16
1
John Doe
2023-01-17
Example of a FULL JOIN
FULL JOIN:
Returns all records when there is a match in either left or right table.
SQL Query:
SELECT customers.customer_id, customers.name, orders.order_dateFROM customersFULLJOIN orders ON customers.customer_id = orders.customer_id;
Result:
customer_id
name
order_date
1
John Doe
2023-01-15
2
Jane Smith
2023-01-16
1
John Doe
2023-01-17
3
Pierre Dupont
NULL
Using GROUP BY in SQL
GROUP BY Clause:
The GROUP BY clause is used to group rows that have the same values in specified columns into summary rows.
It is often used with aggregate functions (COUNT, SUM, AVG, MAX, MIN) to perform calculations on each group of rows.
You want to find out how many orders each customer has placed.
SQL Query:
SELECT customer_id, COUNT(order_id) AS order_countFROM ordersGROUPBY customer_id;
Result:
customer_id
order_count
2
1
1
2
Example: Calculating Total Sales by Product
Scenario:
You want to calculate the total sales amount for each product.
SQL Query:
SELECT product_id, SUM(quantity * price) AS total_salesFROM order_detailsGROUPBY product_id;
Result:
product_id
total_sales
101
4999.95
102
2999.94
103
1999.96
Practical Examples in E-commerce
Example Tables
Customers Table
customer_id
name
email
phone_number
country
1
John Doe
john@example.com
123-456-7890
USA
2
Jane Smith
jane@example.com
987-654-3210
USA
3
Pierre Dupont
pierre@example.fr
555-987-6543
France
Products Table
product_id
product_name
price
101
Laptop
999.99
102
Smartphone
499.99
Orders Table
order_id
customer_id
product_id
quantity
status
1001
1
101
2
Pending
1002
2
102
1
Shipped
Categories Table
category_id
category_name
1
Electronics
2
Furniture
ProductCategories Table
product_id
category_id
101
1
102
1
Diagram
erDiagram CUSTOMERS { int customer_id PK varchar name varchar email varchar phone_number varchar country } PRODUCTS { int product_id PK varchar product_name decimal price } ORDERS { int order_id PK int customer_id FK int product_id FK int quantity varchar status } CATEGORIES { int category_id PK varchar category_name } PRODUCTCATEGORIES { int product_id FK int category_id FK } CUSTOMERS ||--o{ ORDERS : places PRODUCTS ||--o{ ORDERS : contains PRODUCTS ||--o{ PRODUCTCATEGORIES : belongs_to CATEGORIES ||--o{ PRODUCTCATEGORIES : includes
DDL Queries to Create and Populate the Database
-- Create Customers TableCREATETABLE customers ( customer_id INTPRIMARYKEY, name VARCHAR(100), email VARCHAR(100), phone_number VARCHAR(15), country VARCHAR(50));-- Create Products TableCREATETABLE products ( product_id INTPRIMARYKEY, product_name VARCHAR(100), price DECIMAL(10, 2));-- Create Orders TableCREATETABLE orders ( order_id SERIAL PRIMARYKEY, customer_id INTREFERENCES customers(customer_id), product_id INTREFERENCES products(product_id), quantity INT, status VARCHAR(50));-- Create Categories TableCREATETABLE categories ( category_id SERIAL PRIMARYKEY, category_name VARCHAR(100));-- Create ProductCategories TableCREATETABLE product_categories ( product_id INTREFERENCES products(product_id), category_id INTREFERENCES categories(category_id),PRIMARYKEY (product_id, category_id));-- Insert Data into Customers TableINSERTINTO customers (customer_id,name, email, phone_number, country) VALUES(1,'John Doe', 'john@example.com', '123-456-7890', 'USA'),(2,'Jane Smith', 'jane@example.com', '987-654-3210', 'USA'),(3,'Pierre Dupont', 'pierre@example.fr', '555-987-6543', 'France');-- Insert Data into Products TableINSERTINTO products (product_id,product_name, price) VALUES(101,'Laptop', 999.99),(102,'Smartphone', 499.99);-- Insert Data into Orders TableINSERTINTO orders (customer_id, product_id, quantity, status) VALUES(1, 101, 2, 'Pending'),(2, 102, 1, 'Shipped');-- Insert Data into Categories TableINSERTINTO categories (category_name) VALUES('Electronics'),('Furniture');-- Insert Data into ProductCategories TableINSERTINTO product_categories (product_id, category_id) VALUES(101, 1),(102, 1);
Useful Queries
Querying Product Information:
Example:
SELECT product_id, product_name, price FROM products;
UPDATE ordersSET status ='Shipped'WHERE order_id =1;
Deleting a Customer Record:
Example:
DELETEFROM customersWHERE customer_id =3;
Querying Products by Category:
Example:
SELECT p.product_id, p.product_name, c.category_nameFROM products pJOIN product_categories pc ON p.product_id = pc.product_idJOIN categories c ON pc.category_id = c.category_id;
Result:
product_id
product_name
category_name
101
Laptop
Electronics
102
Smartphone
Electronics
Summary
Key Points:
Databases are essential for storing and managing data.
Relational databases organize data into tables, rows, and columns.
SQL is the standard language for interacting with relational databases.
Basic SQL queries include SELECT, INSERT, UPDATE, and DELETE.
IDs, joins, and foreign keys are crucial for establishing relationships between tables.
DDL commands are used to define and manage database structures.
PostgreSQL and MySQL are popular relational database management systems.
Final Thoughts
Quick Bootstrap:
This lecture is a quick bootstrap to get you started with relational databases and SQL.
It is not a full course, so further study and practice are recommended to gain a deeper understanding.
Importance of Databases:
Understanding databases and SQL is crucial for managing data in e-commerce.
Databases ensure data integrity, security, and efficient retrieval, which are essential for any data-driven application.
Keep practicing and exploring to become proficient in SQL and database management.