Database Fundamentals
A comprehensive guide to database concepts, SQL, and design principles essential for software engineering roles.
Table of Contents
- Introduction to Databases
- DBMS Concepts
- Relational Database Design
- SQL Basics
- Advanced SQL
- Normalization
- Transactions & ACID
- Indexing
- Database Design
- Common Interview Questions
Introduction to Databases
What is a Database?
A database is an organized collection of structured data stored electronically, designed for efficient retrieval, management, and updating.
Types of Databases
1. Relational Databases (SQL)
- Data stored in tables with rows and columns
- Relationships between tables
- Examples: MySQL, PostgreSQL, Oracle, SQL Server
2. NoSQL Databases
- Flexible schema
- Types: Document (MongoDB), Key-Value (Redis), Column (Cassandra), Graph (Neo4j)
3. In-Memory Databases
- Data stored in RAM for faster access
- Examples: Redis, Memcached
Why Databases?
β
Data Persistence - Data survives program termination
β
Concurrent Access - Multiple users can access simultaneously
β
Data Integrity - Constraints ensure data validity
β
Security - Access control and authentication
β
Efficient Querying - Fast data retrieval with indexing
β
Backup & Recovery - Data protection mechanisms
DBMS Concepts
What is a DBMS?
Database Management System - Software for creating, managing, and interacting with databases.
DBMS Architecture
Three-Level Architecture:
βββββββββββββββββββββββββββββββ
β External Level β β User Views
β (View 1, View 2, ...) β
βββββββββββββββββββββββββββββββ€
β Conceptual Level β β Logical Structure
β (Tables, Relationships) β
βββββββββββββββββββββββββββββββ€
β Internal Level β β Physical Storage
β (Files, Indexes) β
βββββββββββββββββββββββββββββββ
Data Independence
Logical Data Independence:
- Ability to change conceptual schema without affecting external schemas
- Example: Adding new column doesnβt affect existing views
Physical Data Independence:
- Ability to change internal schema without affecting conceptual schema
- Example: Changing storage structure doesnβt affect queries
Database Models
1. Hierarchical Model
- Tree structure
- Parent-child relationships
- Example: File systems
2. Network Model
- Graph structure
- Many-to-many relationships
3. Relational Model (Most Common)
- Data in tables (relations)
- Relationships via keys
4. Object-Oriented Model
- Data as objects
- Combines OOP with database features
Relational Database Design
Key Concepts
Table (Relation):
- Collection of related data in rows and columns
- Example:
Studentstable
Row (Tuple/Record):
- Single data entry
- Example: One studentβs information
Column (Attribute/Field):
- Data field representing a property
- Example:
student_name,age
Domain:
- Set of allowed values for an attribute
- Example: Age domain: 0-150
Keys
1. Primary Key (PK)
- Uniquely identifies each row
- Cannot be NULL
- Only one per table
CREATE TABLE Students (
student_id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100)
);
2. Foreign Key (FK)
- References primary key of another table
- Establishes relationships
- Can be NULL
CREATE TABLE Enrollments (
enrollment_id INT PRIMARY KEY,
student_id INT,
course_id INT,
FOREIGN KEY (student_id) REFERENCES Students(student_id),
FOREIGN KEY (course_id) REFERENCES Courses(course_id)
);
3. Composite Key
- Primary key made of multiple columns
CREATE TABLE CourseInstructor (
course_id INT,
instructor_id INT,
PRIMARY KEY (course_id, instructor_id)
);
4. Candidate Key
- Columns that could serve as primary key
- Example:
emailandstudent_idboth unique
5. Super Key
- Set of attributes that uniquely identifies rows
- Example:
{student_id, name}is a super key
6. Alternate Key
- Candidate keys not chosen as primary key
Relationships
1. One-to-One (1:1)
Student ββββββββ Passport
1 1
CREATE TABLE Students (
student_id INT PRIMARY KEY,
name VARCHAR(100)
);
CREATE TABLE Passports (
passport_id INT PRIMARY KEY,
student_id INT UNIQUE,
passport_number VARCHAR(20),
FOREIGN KEY (student_id) REFERENCES Students(student_id)
);
2. One-to-Many (1:N)
Department ββββββββ Employee
1 N
CREATE TABLE Departments (
dept_id INT PRIMARY KEY,
dept_name VARCHAR(100)
);
CREATE TABLE Employees (
emp_id INT PRIMARY KEY,
name VARCHAR(100),
dept_id INT,
FOREIGN KEY (dept_id) REFERENCES Departments(dept_id)
);
3. Many-to-Many (M:N)
Student ββββββββ Course
M N
Requires junction/bridge table:
CREATE TABLE Students (
student_id INT PRIMARY KEY,
name VARCHAR(100)
);
CREATE TABLE Courses (
course_id INT PRIMARY KEY,
course_name VARCHAR(100)
);
-- Junction table
CREATE TABLE Enrollments (
student_id INT,
course_id INT,
enrollment_date DATE,
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (student_id) REFERENCES Students(student_id),
FOREIGN KEY (course_id) REFERENCES Courses(course_id)
);
SQL Basics
Data Definition Language (DDL)
CREATE TABLE:
CREATE TABLE Employees (
emp_id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE,
salary DECIMAL(10, 2),
hire_date DATE,
dept_id INT,
FOREIGN KEY (dept_id) REFERENCES Departments(dept_id)
);
ALTER TABLE:
-- Add column
ALTER TABLE Employees ADD COLUMN phone VARCHAR(15);
-- Modify column
ALTER TABLE Employees MODIFY COLUMN salary DECIMAL(12, 2);
-- Drop column
ALTER TABLE Employees DROP COLUMN phone;
-- Add constraint
ALTER TABLE Employees ADD CONSTRAINT chk_salary CHECK (salary > 0);
DROP TABLE:
DROP TABLE Employees;
TRUNCATE TABLE:
TRUNCATE TABLE Employees; -- Deletes all rows, keeps structure
Data Manipulation Language (DML)
INSERT:
-- Single row
INSERT INTO Employees (first_name, last_name, email, salary, hire_date)
VALUES ('John', 'Doe', 'john@example.com', 50000.00, '2024-01-15');
-- Multiple rows
INSERT INTO Employees (first_name, last_name, email, salary)
VALUES
('Jane', 'Smith', 'jane@example.com', 55000.00),
('Bob', 'Johnson', 'bob@example.com', 48000.00);
SELECT:
-- All columns
SELECT * FROM Employees;
-- Specific columns
SELECT first_name, last_name, salary FROM Employees;
-- With conditions
SELECT * FROM Employees WHERE salary > 50000;
-- With sorting
SELECT * FROM Employees ORDER BY salary DESC;
-- With limit
SELECT * FROM Employees LIMIT 10;
UPDATE:
UPDATE Employees
SET salary = 55000.00, dept_id = 2
WHERE emp_id = 1;
-- Update with calculation
UPDATE Employees
SET salary = salary * 1.10
WHERE dept_id = 1;
DELETE:
DELETE FROM Employees WHERE emp_id = 1;
-- Delete all rows (use with caution!)
DELETE FROM Employees;
WHERE Clause Operators
-- Comparison operators
SELECT * FROM Employees WHERE salary >= 50000;
SELECT * FROM Employees WHERE dept_id = 1;
SELECT * FROM Employees WHERE first_name != 'John';
-- BETWEEN
SELECT * FROM Employees WHERE salary BETWEEN 40000 AND 60000;
-- IN
SELECT * FROM Employees WHERE dept_id IN (1, 2, 3);
-- LIKE (pattern matching)
SELECT * FROM Employees WHERE first_name LIKE 'J%'; -- Starts with J
SELECT * FROM Employees WHERE email LIKE '%@gmail.com'; -- Ends with
SELECT * FROM Employees WHERE last_name LIKE '_o%'; -- Second char is 'o'
-- IS NULL / IS NOT NULL
SELECT * FROM Employees WHERE dept_id IS NULL;
SELECT * FROM Employees WHERE email IS NOT NULL;
-- Logical operators
SELECT * FROM Employees
WHERE salary > 50000 AND dept_id = 1;
SELECT * FROM Employees
WHERE dept_id = 1 OR dept_id = 2;
SELECT * FROM Employees
WHERE NOT (salary < 40000);
Advanced SQL
Aggregate Functions
-- COUNT
SELECT COUNT(*) FROM Employees;
SELECT COUNT(DISTINCT dept_id) FROM Employees;
-- SUM
SELECT SUM(salary) FROM Employees;
-- AVG
SELECT AVG(salary) FROM Employees;
-- MIN / MAX
SELECT MIN(salary), MAX(salary) FROM Employees;
GROUP BY & HAVING
-- Group by department
SELECT dept_id, COUNT(*) as emp_count, AVG(salary) as avg_salary
FROM Employees
GROUP BY dept_id;
-- HAVING (filter groups)
SELECT dept_id, AVG(salary) as avg_salary
FROM Employees
GROUP BY dept_id
HAVING AVG(salary) > 50000;
-- Multiple grouping
SELECT dept_id, YEAR(hire_date) as year, COUNT(*) as count
FROM Employees
GROUP BY dept_id, YEAR(hire_date)
ORDER BY dept_id, year;
Joins
INNER JOIN:
SELECT e.first_name, e.last_name, d.dept_name
FROM Employees e
INNER JOIN Departments d ON e.dept_id = d.dept_id;
LEFT JOIN (LEFT OUTER JOIN):
-- All employees, even without department
SELECT e.first_name, e.last_name, d.dept_name
FROM Employees e
LEFT JOIN Departments d ON e.dept_id = d.dept_id;
RIGHT JOIN (RIGHT OUTER JOIN):
-- All departments, even without employees
SELECT e.first_name, d.dept_name
FROM Employees e
RIGHT JOIN Departments d ON e.dept_id = d.dept_id;
FULL OUTER JOIN:
-- All employees and all departments
SELECT e.first_name, d.dept_name
FROM Employees e
FULL OUTER JOIN Departments d ON e.dept_id = d.dept_id;
CROSS JOIN:
-- Cartesian product
SELECT e.first_name, d.dept_name
FROM Employees e
CROSS JOIN Departments d;
SELF JOIN:
-- Find employees with same salary
SELECT e1.first_name, e2.first_name, e1.salary
FROM Employees e1
JOIN Employees e2 ON e1.salary = e2.salary AND e1.emp_id != e2.emp_id;
Subqueries
Single-row subquery:
SELECT * FROM Employees
WHERE salary > (SELECT AVG(salary) FROM Employees);
Multi-row subquery:
SELECT * FROM Employees
WHERE dept_id IN (
SELECT dept_id FROM Departments WHERE location = 'New York'
);
Correlated subquery:
SELECT e1.first_name, e1.salary
FROM Employees e1
WHERE e1.salary > (
SELECT AVG(e2.salary)
FROM Employees e2
WHERE e2.dept_id = e1.dept_id
);
EXISTS:
SELECT d.dept_name
FROM Departments d
WHERE EXISTS (
SELECT 1 FROM Employees e WHERE e.dept_id = d.dept_id
);
String Functions
-- CONCAT
SELECT CONCAT(first_name, ' ', last_name) as full_name FROM Employees;
-- UPPER / LOWER
SELECT UPPER(first_name), LOWER(last_name) FROM Employees;
-- LENGTH
SELECT first_name, LENGTH(first_name) as name_length FROM Employees;
-- SUBSTRING
SELECT SUBSTRING(email, 1, 5) FROM Employees;
-- TRIM
SELECT TRIM(' hello '); -- 'hello'
-- REPLACE
SELECT REPLACE(email, '@gmail.com', '@company.com') FROM Employees;
Date Functions
-- Current date/time
SELECT NOW(), CURDATE(), CURTIME();
-- Extract parts
SELECT YEAR(hire_date), MONTH(hire_date), DAY(hire_date) FROM Employees;
-- Date arithmetic
SELECT hire_date, DATE_ADD(hire_date, INTERVAL 1 YEAR) FROM Employees;
SELECT DATEDIFF(NOW(), hire_date) as days_employed FROM Employees;
-- Formatting
SELECT DATE_FORMAT(hire_date, '%M %d, %Y') FROM Employees;
Window Functions
-- ROW_NUMBER
SELECT first_name, salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) as row_num
FROM Employees;
-- RANK (with gaps for ties)
SELECT first_name, salary,
RANK() OVER (ORDER BY salary DESC) as rank
FROM Employees;
-- DENSE_RANK (no gaps)
SELECT first_name, salary,
DENSE_RANK() OVER (ORDER BY salary DESC) as dense_rank
FROM Employees;
-- Partition by
SELECT first_name, dept_id, salary,
ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) as dept_rank
FROM Employees;
Normalization
Process of organizing data to reduce redundancy and improve integrity.
Normal Forms
1NF (First Normal Form):
- Atomic values (no multi-valued attributes)
- Each row is unique
β Violates 1NF:
| student_id | name | courses |
|------------|-------|------------------|
| 1 | John | Math, Physics |
β Follows 1NF:
| student_id | name | course |
|------------|-------|----------|
| 1 | John | Math |
| 1 | John | Physics |
2NF (Second Normal Form):
- Must be in 1NF
- No partial dependencies (non-key attributes depend on entire primary key)
β Violates 2NF:
| student_id | course_id | student_name | course_name |
|------------|-----------|--------------|-------------|
| 1 | 101 | John | Math |
(student_name depends only on student_id, not the entire composite key)
β Follows 2NF:
Students:
| student_id | student_name |
|------------|--------------|
| 1 | John |
Courses:
| course_id | course_name |
|-----------|-------------|
| 101 | Math |
Enrollments:
| student_id | course_id |
|------------|-----------|
| 1 | 101 |
3NF (Third Normal Form):
- Must be in 2NF
- No transitive dependencies (non-key attributes depend only on primary key)
β Violates 3NF:
| emp_id | name | dept_id | dept_name |
|--------|-------|---------|-----------|
| 1 | John | 10 | Sales |
(dept_name depends on dept_id, not directly on emp_id)
β Follows 3NF:
Employees:
| emp_id | name | dept_id |
|--------|-------|---------|
| 1 | John | 10 |
Departments:
| dept_id | dept_name |
|---------|-----------|
| 10 | Sales |
BCNF (Boyce-Codd Normal Form):
- Stronger version of 3NF
- For every functional dependency X β Y, X must be a super key
4NF (Fourth Normal Form):
- Must be in BCNF
- No multi-valued dependencies
Benefits of Normalization
β
Reduces data redundancy
β
Improves data integrity
β
Easier maintenance
β
Better query performance (for updates)
Denormalization
Intentionally adding redundancy for performance:
- Faster read queries (fewer joins)
- Used in data warehouses, reporting systems
Transactions & ACID
Transaction
A transaction is a logical unit of work containing one or more SQL operations.
START TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;
UPDATE accounts SET balance = balance + 100 WHERE account_id = 2;
COMMIT; -- or ROLLBACK; if error
ACID Properties
A - Atomicity:
- All operations succeed or all fail
- No partial transactions
BEGIN TRANSACTION;
-- Transfer money
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
-- If any fails, ROLLBACK entire transaction
COMMIT;
C - Consistency:
- Database remains in valid state
- Constraints are enforced
- Example: Balance cannot be negative
I - Isolation:
- Concurrent transactions donβt interfere
- Intermediate states not visible to others
Isolation Levels:
- READ UNCOMMITTED - Dirty reads possible
- READ COMMITTED - No dirty reads
- REPEATABLE READ - Same data on re-read
- SERIALIZABLE - Highest isolation
D - Durability:
- Committed changes persist
- Survive system crashes
Transaction Commands
-- Start transaction
START TRANSACTION;
BEGIN;
-- Save changes
COMMIT;
-- Undo changes
ROLLBACK;
-- Savepoint
SAVEPOINT sp1;
ROLLBACK TO sp1;
Indexing
What is an Index?
Data structure that improves query performance by providing quick lookups.
Analogy: Like a book index - quickly find pages without reading entire book.
Types of Indexes
1. Primary Index:
- Automatically created on primary key
- Unique, clustered
2. Secondary Index:
- Created on non-primary key columns
- Non-clustered
3. Unique Index:
CREATE UNIQUE INDEX idx_email ON Employees(email);
4. Composite Index:
CREATE INDEX idx_name ON Employees(first_name, last_name);
5. Full-Text Index:
CREATE FULLTEXT INDEX idx_description ON Products(description);
Creating Indexes
-- Single column
CREATE INDEX idx_salary ON Employees(salary);
-- Multiple columns
CREATE INDEX idx_dept_salary ON Employees(dept_id, salary);
-- Unique index
CREATE UNIQUE INDEX idx_email ON Employees(email);
-- Drop index
DROP INDEX idx_salary ON Employees;
When to Use Indexes
β Use indexes when:
- Column frequently used in WHERE clause
- Column used in JOIN conditions
- Column used in ORDER BY
- Large tables with many rows
- Read-heavy workloads
β Avoid indexes when:
- Small tables
- Frequently updated columns
- Low cardinality columns (few distinct values)
- Write-heavy workloads
Index Performance
Benefits:
- Faster SELECT queries
- Faster JOIN operations
- Faster sorting
Costs:
- Slower INSERT/UPDATE/DELETE
- Requires storage space
- Maintenance overhead
π¨ Database Design
Design Process
- Requirements Analysis
- Understand data needs
- Identify entities and attributes
- Conceptual Design (ER Diagram)
- Create Entity-Relationship diagram
- Define relationships
- Logical Design
- Convert ER to tables
- Apply normalization
- Physical Design
- Choose data types
- Create indexes
- Optimize for performance
ER Diagram Components
Entities: Objects with attributes
βββββββββββββββ
β Student β
βββββββββββββββ€
β student_id β
β name β
β email β
βββββββββββββββ
Relationships: Associations between entities
Student ββ[enrolls in]ββ Course
Attributes: Properties of entities
- Simple: Cannot be divided (age)
- Composite: Can be divided (full_name β first_name + last_name)
- Derived: Calculated (age from date_of_birth)
- Multi-valued: Multiple values (phone_numbers)
Example: University Database
Entities:
- Students (student_id, name, email, dob)
- Courses (course_id, course_name, credits)
- Instructors (instructor_id, name, department)
- Enrollments (enrollment_id, student_id, course_id, grade)
Relationships:
- Students enroll in Courses (M:N)
- Instructors teach Courses (1:N)
Tables:
CREATE TABLE Students (
student_id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(100) UNIQUE,
date_of_birth DATE
);
CREATE TABLE Courses (
course_id INT PRIMARY KEY AUTO_INCREMENT,
course_code VARCHAR(10) UNIQUE,
course_name VARCHAR(100),
credits INT,
instructor_id INT,
FOREIGN KEY (instructor_id) REFERENCES Instructors(instructor_id)
);
CREATE TABLE Instructors (
instructor_id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100),
department VARCHAR(50),
email VARCHAR(100) UNIQUE
);
CREATE TABLE Enrollments (
enrollment_id INT PRIMARY KEY AUTO_INCREMENT,
student_id INT,
course_id INT,
enrollment_date DATE,
grade VARCHAR(2),
FOREIGN KEY (student_id) REFERENCES Students(student_id),
FOREIGN KEY (course_id) REFERENCES Courses(course_id),
UNIQUE(student_id, course_id)
);
β Common Interview Questions
Theory Questions
1. What is a database?
- Organized collection of structured data
- Stored electronically for efficient access and management
2. What is DBMS?
- Database Management System
- Software to create, manage, and query databases
3. Difference between SQL and NoSQL?
- SQL: Structured, fixed schema, ACID, relational
- NoSQL: Flexible schema, eventual consistency, various models
4. What is normalization?
- Process to organize data and reduce redundancy
- Improves data integrity
- Normal forms: 1NF, 2NF, 3NF, BCNF
5. What are ACID properties?
- Atomicity: All or nothing
- Consistency: Valid state
- Isolation: Concurrent transactions donβt interfere
- Durability: Changes persist
6. Primary key vs Foreign key?
- Primary: Uniquely identifies row, cannot be NULL
- Foreign: References primary key of another table, establishes relationship
7. What is an index?
- Data structure for faster query performance
- Trade-off: Faster reads, slower writes
8. Types of joins?
- INNER: Matching rows from both tables
- LEFT: All from left + matching from right
- RIGHT: All from right + matching from left
- FULL OUTER: All rows from both tables
- CROSS: Cartesian product
9. What is a transaction?
- Logical unit of work
- All operations succeed or all fail
- Ensures data consistency
10. Clustered vs Non-clustered index?
- Clustered: Physical order matches index order, one per table
- Non-clustered: Separate structure with pointers, multiple per table
SQL Query Questions
1. Find second highest salary:
SELECT MAX(salary)
FROM Employees
WHERE salary < (SELECT MAX(salary) FROM Employees);
-- OR using LIMIT
SELECT salary
FROM Employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;
-- OR using DENSE_RANK
SELECT salary
FROM (
SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) as rank
FROM Employees
) temp
WHERE rank = 2;
2. Find duplicate records:
SELECT email, COUNT(*)
FROM Users
GROUP BY email
HAVING COUNT(*) > 1;
3. Delete duplicate rows keeping one:
DELETE FROM Users
WHERE id NOT IN (
SELECT MIN(id)
FROM Users
GROUP BY email
);
4. Find employees earning more than their manager:
SELECT e1.name
FROM Employees e1
JOIN Employees e2 ON e1.manager_id = e2.emp_id
WHERE e1.salary > e2.salary;
5. Find departments with no employees:
SELECT d.dept_name
FROM Departments d
LEFT JOIN Employees e ON d.dept_id = e.dept_id
WHERE e.emp_id IS NULL;
6. Nth highest salary using subquery:
SELECT salary
FROM Employees e1
WHERE (N-1) = (
SELECT COUNT(DISTINCT salary)
FROM Employees e2
WHERE e2.salary > e1.salary
);
7. Running total:
SELECT date, amount,
SUM(amount) OVER (ORDER BY date) as running_total
FROM Sales;
8. Pivot table:
SELECT
student_name,
SUM(CASE WHEN subject = 'Math' THEN score ELSE 0 END) as Math,
SUM(CASE WHEN subject = 'English' THEN score ELSE 0 END) as English
FROM Scores
GROUP BY student_name;
π Key Takeaways
β
Understand relational model - Tables, keys, relationships
β
Master SQL basics - SELECT, INSERT, UPDATE, DELETE, JOIN
β
Know normalization - Reduce redundancy, improve integrity
β
Understand ACID - Critical for transaction management
β
Learn indexing - When and how to use indexes
β
Practice SQL queries - Write queries for common scenarios
β
Design databases - ER diagrams to table structures
π Study Tips
- Practice SQL queries - Use online platforms (SQLZoo, LeetCode)
- Draw ER diagrams - Visualize database structure
- Normalize tables - Practice converting to 3NF
- Write complex queries - Joins, subqueries, aggregations
- Understand trade-offs - Normalization vs performance
- Learn optimization - Indexes, query execution plans
Master these concepts and youβll be well-prepared for database interviews! π