Fundamentals of Databases
Table of Contents
1 Conceptual Data Models and Entity Relationship Modelling
Databases
Databases
are a collection of organised information so the information can be easily accessed, processed and managed.A database management system (DBMS)
is a software package designed to define, manipulate, retrieve and manage data in a database.- A database organise data using
tables
. - Each table holds one type of
entity
.An entity
is one type of object, such as person, book, DVDs etc.- Each entity has
attributes
to describe the entity, such as person will have firstName,lastname, DoB,etc and a book will have title, author, publishDate etc. Those attributes in a database table are also calledfields
orcolumns
. A database table
organise data inrecords
. Records are also calledrows
.- One record/row contains attributes/fields/columns related to one type of entity. This can be written in the standard
relational notation
as following:
entityName(attrubute1, attribute2, attribute3,…)
Examples: person(firstName, lastName, DoB)
book(title, author, publishDate)
- Each entity has a unique entity identifier. This identofier is called a
primary key
in a database. In the relational notation, the primary key is noted using underlining:
person( personID , firstName, lastName, DoB)
book( bookID , title, author, publishDate)
Data Models
Data models
define how data is connected to each other and how they are processed and stored inside the system.- A data model is designed to allow easy data input, process and store to meet specific application needs.
Take a look at the following problem:
Data Modelling Example 1: Film rental system
Goodfilm is an organisation which rents copies of films to its members on DVD. The copy is sent to the member by post and returned by the member after use. Some films are very popular so there may be several copies of some films and each individual copy has an id-number. Each copy was purchased on a particular date. Goodfilm need to keep track of which copy of a film is loaned to which member. Each film has a film id-number, a title, a genre, (for instance, action, romance, horror) and a year of release, for instance 2013. Each member of Goodfilm has an id-number, name and address. Design a database system for the above situation.
- To design the data model, first, you need to identify the different
entities
that need to be organised for processing:- film
- copy of each film
- member
- rental
- After step 1, you need to identify the attributes related to one type of entity. For example, the entity film will include name, genre and releaseYear relating to it.
- After done step 2 for all entities, you need to identify the
relationships
among all entities so that the application can easily process the data in the resulting database. An example of such relationship will be, whichmember
has rented whichcopy
of afilm
on what date.
Relational Database
A relational database
is a database that has been designed and structured so that the relationships among different entities are identified by shared attributes among different entities.
Relationships
are normally defined by using primary, foreign or/and composite keys.
primary key
is a unique identifier for a record.foreign key
is a primary key from a different table used to define the relationship.composite key
is a combination of two or more attributes in a table that can be used to uniquely identify each record in the table.- Using the standard notation, the above Goodfilm example entities can be modelled as below:
film(
copy(
member(
rental(
The primary keys are underlined
, the foreign keys are overlined
, and the composite key is both underlined
(not in this example).
Different types of relationships and entity relationship diagram
A relationship between any two entities may be in one of the three:
one-to-one
: one entity can relate to the other entity exactly once. For example, thecopy
entity can only be related in therental
entity once. In other words, only one unique copy can be rented out by one member, that is, exactly one unique copyID exists in the rental table. This relationshio is describe usingentity relationship diagram
by a sinlge line between the entities:
one-to-many
: one entity can relate to the other entity more than once. For example, one film can have multiple copies in the copy entity. In other words, one filmID can appear multiple times in the copy table. This relationshio in entity relationship diagram:many-to-many
: one entity can appear multiple times in the other entity and vice versa. For example, a customer can buy multiple types of products and one type of product can be purchased by multiple customers. This relationshion in entity relationship diagram:- When model a many-to-many relationship, an extra table is needed in order to link the customer and the product entities.
- When model a many-to-many relationship, an extra table is needed in order to link the customer and the product entities.
- This can be expressed in standard notation as:
customer(
product(
order(
Exercises
Exercise 1:
- In a certain college, students are able to attend evening courses which are taught by tutors. A student can attend any number of courses, and each course is taught by a single tutor, although a tutor may teach more than one course.
- Design a data model using the standard notation.
- Using an example, explain what is meant by a primary key in a database.
- Using an example, explain what is meant by a foreign key in a database.
- Construct an entity relationship diagram to illustrate this situation.
Exercise 2:
- MidWales Carriers employs a number of drivers for its vehicles. Each driver is based at just one depot (for instance Aberystwyth or Newtown) and each depot has a full address, telephone number and a supervisor. Each driver has an ID number, name and home address, and is based at just one depot. Each vehicle has a registration number,make and model, and is also based at just one depot. Each supervisor (who may be thesupervisor for more than one depot) has an ID number and a home address.
- design a data model for the above situation.
- Construct an entity-relationship diagram to illustrate the above situation
2 Database normalisation
Database normalisation
is the process to make a database model more efficient.
The purpose of normalisation
is to have a database structure in the third normal form
.
To understand this, lets start with the first, second, then the third normal form.
First Normal Form(1NF)
A table is said to be in first normal form(1NF)
if:
- it has no repeating attribute/field or groups of attributes, and
- it contains only atomic values - not composite or multivalued
- Take a look at the table below. for each record, it has three attributes, name, courseEnrolled and courseFinishDate. It is not in first normal form since it contains repeating groups of attributes for the courseEnrolled and the courseFinishDate has composite values or it is multi-valued.
name courseEnrolled courseEnrolled2 courseFinishDate John Doe Java Python 12/12/2015, 1/8/2013 Mary Jane Perl Java 29/4/2015, 21/5/2016 - To make the above table in first normal form, we need to make courseEnrolled and courseFinishDate contain only one/atomic value. To do this, we could expand the data into more rows. The combination of name and courseEnrolled is unique for each record. The database is now in
1ST normal form
.
name | courseEnrolled | courseFinishDate | |
---|---|---|---|
John Doe | Java | 12/12/2015 | |
John Doe | Python | 1/8/2013 | |
Mary Jane | Perl | 29/4/2015 | |
Mary Jane | Java | 21/5/2016 |
Second Normal Form(2NF)
A table is said in second normal form
if:
- It is in 1ST normal form, and
- all
non-key attributes
are fully functionally dependent on the primary key. Functionally dependent
means if one attribute change requires other attribute change.- Each table must only contain one type of entity.
- This partially depdent on the primary key only occurs if there is a composite key.
Take a look at the following table:
pupilID | courseID | courseLocation | finishDate |
---|---|---|---|
1 | 2 | Bourne | |
1 | 3 | Peterborough |
- The above table has a
composite key
, that is, two attributes pupilID and courseID combined to uniquely identify a record. - The attribute courseLocation is not fully dependent on the composite key (also the primary key). It patially depends on the courseID.
- To make the above table in the second normal form, the courseLocation attribute should be in the course table:
courseID courseName courseLocation 1 Java Bourne 2 Perl Stamford 3 Python Peterborough
Third Normal Form(3NF)
A table is said to be in third normal form
if:
- It is in the second normal form, AND
- there is no non-key dependencies, or no atrributes depdent on other attrbutes.
All attributes are dependent on the key, the whole key, and nothing but the key.
- The Goodfilm is the example of tables in third normal form.
- To better illustrate this, lets exam a table that is not in 3RD normal form but is in 2ND normal form.
courseID | courseName | courseLocation | teacherID | teacherName |
---|---|---|---|---|
1 | Java | Bourne | 22 | John Smith |
2 | Perl | Stamford | 12 | Jane Doe |
3 | Python | Peterborough | 33 | Marry Jane |
- The attribute teacherName is dependent on the teacherID and vice versa.
- To make the above table in 3RD normal form, we need to get rid of one of the two co-dependent attrubutes.
Purpose of Database Normalisation
- To reduce data redundency
- This means to avoid the same data being stored as multiple attributes in one table or cross multiple tables. This wastes storage space and easily result on data inconsistency.
- For example, store customer's name on customer table as well as in orders table.
- To maintain data integrity
- This means to maintain data accuracy and consistency. When data being updated, there is no danger that that piece of data has been updated in one place but not in other places. For example, when updating the bodyMass of person, the isOverWeight attribute may not be updated.
- Smaller tables, therefore, faster index, fast sorting, searching, updating.
- No accidental deleting a record if that record has a relation with other entities that are still serving business needs. For example, deleting a member while the member is still having a rented movie not returned. A database will prevent this from happening.
3 Database Query Language- SQL
SQL Basics
- SQL stands for
structured query language
- It is for accessing and manipulating databases
- It is a standard language (ANSI) and all vendors implement
- Its commonly used statements include
- SELECT
- UPDATE
- INSERT
- DELETE
- WHERE
- CREATE
DDL Basics
- DDL stands for data definition language
- DDL is a standard for for commands that define the different structures in a database.
- DDL statements create, modify, and remove database objects such as tables, indexes, and users.
- Common DDL statements are CREATE, ALTER, and DROP.
For the next several SQL related contents, you will need to use an existing database created for you which has two tables to begin with: member and film as for the previous Goodfilm example.
For practice SQL, there are many free online websites. Here are two of them:
Retrieve data from a database
- To selects all attributes for all records from the table
member
. The * is a wild card which means all attributes.
SELECT * FROM member;
- To select only specific attributes for each record.
SELECT memberFirstName, memberLastName FROM member;
- To select only specific attributes that meets certain condition. The following statement will return member's lastname whose first name is 'Jack'.
SELECT memberLastName FROM member WHERE memberFirstName='Jack';
- To sort your search results by a specific attributes, for example, sort by memberLastName from the previous query:
SELECT memberLastName FROM member WHERE memberFirstName='Jack' ORDER BY memberLastName;
- By default the sorting order is ascending, but you can specify otherwise:
SELECT memberLastName FROM member WHERE memberFirstName='Jack' ORDER BY memberLastName DESC;
- Order your search results by more than one attributes:
SELECT memberLastName, memberJoinDate FROM member WHERE memberFirstName='Jack' ORDER BY memberLastName DESC, memberJoinDate ASC;
Update data in a database
- To update an existing record's attribute(s) that meeting a specified condition:
UPDATE member SET memberLastName='Cai', FirstName='X' WHERE memberID=1; It is vital to have the WHERE clause, otherwise you will update all records in the member table!
Insert data into a database
- To insert a new record into an existing table:
INSERT INTO member VALUES(NULL, 'Xiaohui', 'Ellis',NULL); or INSERT INTO member (memberFirstName, memberLastName) VALUES ('Jack','Schnauzer'); Both statements work. The second statememnt included the attribute names and their corresponding values. Since the memberID is automatically generated by the database, you leave it by using the key word "NULL".
Delete data from a database
- To delete records from a table that meet certain condition(s):
DELETE FROM member WHERE memberJoinDate <'1970';
- To delete all records from a table if you must:
DELETE FROM member;
Create a database
- To create a brand new database:
CREATE DATABASE goodfilm;
Alter a database table
- A database table can be modified to suit ne needs
- To update a field in an existing database table (for MySQL, other databases may differ):
ALTER TABLE member MODIFY COLUMN memberAddress varchar(200);
- To add a new field in an existing database table (for MySQL, other databases may differ):
ALTER TABLE member ADD memberMobile varchar(16);
- To remove a field in an existing database table (for MySQL, other databases may differ):
ALTER TABLE member DROP memberMobile;
Create a database table
- A database table is defined in its simplest form, by a table name, attribute names, and data types for each attribute. Other properties of a table, such as primary key, foreign key, index, attribute uniqueness, field can or cannot have NULL values, security and access rights can also be defined.
- To create a brand new database table (for MySQL, other databases may differ):
CREATE TABLE member ( memberID int(11) NOT NULL AUTO_INCREMENT, memberFirstName varchar(20) NOT NULL, memberLastName varchar(20) NOT NULL, memberAddress varchar(120), PRIMARY KEY (memberID) ); The AUTO_INCREMENT property is to specify how the primary key memberID is to be generated. In this case, it is automatically generated and increased by the DBMS.
CREATE TABLE copy ( copyID int(11) NOT NULL AUTO_INCREMENT, filmID int(11), purchaseDate date NOT NULL, PRIMARY KEY (copyID), FOREIGN KEY (filmID) REFERENCES film(filmID) );
4 More Complex SQL
Using the Goodfilm example:
film(
copy(
member(
rental(
Here are some of the likely business needs from the database:
- Find out all movie copies for a given movie title.
- Find out the number of movie copies for a given movie title.
- Find out the names of members who have checked out certain movie titles.
- Find out the names of members who have checked out certain movies more than 7 days ago.
To fillfil the above queries, a simple query from one table will not be enough since multiple entities/tables are involved.
Query records from two related tables
- To find out all movie copies for a given movie title, we need to query the table copy and film for a given filmName. Those two tables are linked by a foreign key, filmID in the copy table. The filmID is the copy table and the filmID in the film table should be the same for a given filmName. This kind of query is called
inner join
.
SELECT * FROM film INNER JOIN copy ON film.fileID = copy.filmID WHERE film.filmName='The Matrix';
- An alternative to JOIN:
SELECT * FROM film, copy WHERE film.fileID = copy.filmID AND film.filmName='The Matrix';
- To find out the number of movie copies for a given movie title. To do this, we need to use a SQL function called count.
SELECT COUNT(*) FROM film INNER JOIN copy ON film.fileID = copy.filmID WHERE film.filmName='The Matrix';
Query records from four related tables
- To find out the names of members who have checked out certain movie titles, we need to query three tables. This is because, the member table has memberName, the film table has filmName and the rental table has the information on which memberID checked out which copyID.
- Uisng INNER JOIN
SELECT member.memberFirstName,film.filmName FROM film inner join copy on copy.filmID=film.filmID inner join rental on copy.copyID=rental.copyID inner join member on member.memberID=rental.memberID WHERE film.filmName='Alien';
- NOT using INNER JOIN
SELECT member.memberFirstName,film.filmName FROM film, copy, rental, member WHERE copy.filmID=film.filmID AND copy.copyID=rental.copyID AND member.memberID=rental.memberID WHERE film.filmName='Alien';
Exercises
- Go to this link to work on the sequence of exercises
5 Client Server Database
Client- Server model of operation on database
Many enterprise database based applications use the client-server model of operation, that is, a centralised database server dealing with many requests from different clients on different PC. An example of this will be a school's student information management system (such as SIMS)- different teachers use a client software on their PC to send requests to the server, and the server responds to each client. Any updates to the server from one client will be available to all clients.
- The advantages of using a client-server database model:
- Avoid data inconsistency since only one copy of the database is accessed rather than maintaining multiple copies on individual client.
- All updates will be immediately available to all clients.
- Avoid expensive cost of resources required to have multiple copies of database. Those cost can be hardware, software, and administration staff.
- Access rights and security can be centrally managed.
- Backup and recovery can be centrally managed.
- The disadvantages of using a client-server database model:
- when clients update the same record at the same time(simultaneously),some updates will be lost. Considering the following:
- Both client A and B trying to update the same record of member1 at the same time. So they both check out a copy of the record at the same time into their own PC's memory.
- Client A makes changes to member 1's address on its local memory
- Client B makes changes to member 1's name on its local memory
- Client A saves the change to the server and changes the address
- Client B saves the change to the server and changes the name
- The server now has member 1's name changed but not address!
- To prevent the above situation of update loss and data inconsistency, a DBMS uses a
record locking
mechanism - whenever a record is being updated, it is being locked until the transaction is completed or cancelled.
- when clients update the same record at the same time(simultaneously),some updates will be lost. Considering the following:
- The above situation is illustrated in the diagram below:
6 Concurrency Control with Record Locking
Record Locking
Record locking
is a mechnism to prevent simulteneous updating to a database record to prevent inconsistent results.- If there is no record locking, multiple users read the same original record, then try to update their own changes to the database. All the other changes will be overwritten except the last update made. Data integrity is lost.
A lock
can be placed on a single record when retrieved for editing or updating. Anyone attempting to retrieve the same record for editing is denied write access because of the lock. Once the record is saved or edits are canceled, the lock is released.- The purpose of
record locking
is to:- prevent updates being lost
- preserve data integrity and consistency
Dead Lock
Record locking can prevent updates being lost and data inconsistency. But considering the following situation:
- Client A transferring money from customer 1's balance to customer 2' balance.
- Customer 1's record is now locked.
- Client B transferring money from customer 2's balance to customer 1's balance.
- Customer 2's record is now locked.
- A deadlock occurs since bother clients now waiting for a locked record.
- A deadlock happens when no update is made and all update requests stuck in a waiting state.
- All activity comes to a halt and remains at a standstill forever unless the DBMS detects the deadlock and aborts one of the transactions.
Dealing with deadlocking using serialisation
Serialisation
is a mechanism to ensure there is no transactions overlap in time. A transaction cannot start until the previous one has finished. It is normally implemented using one of the following techniques:
timestamp ordering
- A timestamp is given at the start of each transaction. For two transactions updating the same record, the transaction with the earlier timestamp will be applied first.
- To avoid updates being lost, each object in the database has a
read timestamp
and awrite timestamp
. Those timestamps are updated whenever an object is read or written. - When an updated is about to be made, the DBMS checks the read timestamp against the start of the transaction. If the read or write timestamp is newer than the start of the transaction, reject.
commitment ordering
- In this technique, transactions are ordered by their dependencies on each other as well as the time they started.
- If a transaction has dependecies on another, it is blocked until the dependent-upon transactions have finished/commited.