Fundamentals of Databases

Table of Contents

1 Conceptual Data Models and Entity Relationship Modelling

Databases

  • Databases are a collection of organised information so the information can be easily accessed, processed and managed.
  • A database management system (DBMS) is a software package designed to define, manipulate, retrieve and manage data in a database.
  • A database organise data using tables.
  • Each table holds one type of entity.
    • An entity is one type of object, such as person, book, DVDs etc.
    • Each entity has attributes to describe the entity, such as person will have firstName,lastname, DoB,etc and a book will have title, author, publishDate etc. Those attributes in a database table are also called fields or columns.
    • A database table organise data in records. Records are also called rows.
    • One record/row contains attributes/fields/columns related to one type of entity. This can be written in the standard notation as following:

entityName(attrubute1, attribute2, attribute3,…)

Examples: person(firstName, lastName, DoB)

book(title, author, publishDate)

  • Each entity has a unique entity identifier. This identofier is called a primary key in a database. In the standard notation, the primary key is noted using underlining:

person( personID , firstName, lastName, DoB)

book( bookID , title, author, publishDate)

Data Models

  • Data models define how data is connected to each other and how they are processed and stored inside the system.
  • A data model is designed to allow easy data input, process and store to meet specific application needs.

Take a look at the following problem:

Data Modelling Example 1: File rental system

Goodfilm is an organisation which rents copies of films to its members
on DVD. The copy is sent to the member by post and returned by the 
member after use.
Some films are very popular so there may be several copies of some 
films and each individual copy has an id-number. Each copy was 
purchased on a particular date.
Goodfilm need to keep track of which copy of a film is loaned to
which member.
Each film has a film id-number, a title, a genre, (for instance,
action, romance, horror) and a year of release, for instance 2013.
Each member of Goodfilm has an id-number, name and address.
Design a database system for the above situation.
  1. To design the data model, first, you need to identify the different entities that need to be organised for processing:
    • film
    • copy of each film
    • member
    • rental
  2. After step 1, you need to identify the attributes related to one type of entity. For example, the entity film will include name, genre and releaseYear relating to it.
  3. After done step 2 for all entities, you need to identify the relationships among all entities so that the application can easily process the data in the resulting database. An example of such relationship will be, which member has rented which copy of a film on what date.

Relational Database

A relational database is a database that has been designed and structured so that the relationships among different entities are identified by shared attributes among different entities.

Relationships are normally defined by using primary, foreign or/and composite keys.

  • primary key is a unique identifier for a record.
  • foreign key is a primary key from a different table used to define the relationship.
  • composite key is a combination of two or more attributes in a table that can be used to uniquely identify each record in the table.
  • Using the standard notation, the above Goodfilm example entities can be modelled as below:

film(filmID, filmName, genre, releaseYear)

copy(copyID, filmID, purchaseDate)

member(memberID, memberName, memberAddress)

rental(rentalID,memberID,copyID, rentalDate)

The primary keys are underlined, the foreign keys are overlined, and the composite key is both underlined (not in this example).

Different types of relationships and entity relationship diagram

A relationship between any two entities may be in one of the three:

  • one-to-one: one entity can relate to the other entity exactly once. For example, the copy entity can only be related in the rental entity once. In other words, only one unique copy can be rented out by one member, that is, exactly one unique copyID exists in the rental table. This relationshio is describe using entity relationship diagram by a sinlge line between the entities:

relational-diagram1.png

  • one-to-many: one entity can relate to the other entity more than once. For example, one film can have multiple copies in the copy entity. In other words, one filmID can appear multiple times in the copy table. This relationshio in entity relationship diagram: relational-diagram2.png
  • many-to-many: one entity can appear multiple times in the other entity and vice versa. For example, a customer can buy multiple types of products and one type of product can be purchased by multiple customers. This relationshion in entity relationship diagram: relational-diagram3.png
    • When model a many-to-many relationship, an extra table is needed in order to link the customer and the product entities. relational-diagram4.png
  • This can be expressed in standard notation as:

customer(customerID, customerName, address, phoneNumber)

product(productID, productName, manufacture, model, price)

order(orderID, customerID, productID, orderDate)

Exercises

Exercise 1:

  • In a certain college, students are able to attend evening courses which are taught by tutors. A student can attend any number of courses, and each course is taught by a single tutor, although a tutor may teach more than one course.
    1. Design a data model using the standard notation.
    2. Using an example, explain what is meant by a primary key in a database.
    3. Using an example, explain what is meant by a foreign key in a database.
    4. Construct an entity relationship diagram to illustrate this situation.

    Exercise 2:

  • MidWales Carriers employs a number of drivers for its vehicles. Each driver is based at just one depot (for instance Aberystwyth or Newtown) and each depot has a full address, telephone number and a supervisor. Each driver has an ID number, name and home address, and is based at just one depot. Each vehicle has a registration number,make and model, and is also based at just one depot. Each supervisor (who may be thesupervisor for more than one depot) has an ID number and a home address.
    1. design a data model for the above situation.
    2. Construct an entity-relationship diagram to illustrate the above situation

2 Database normalisation

Database normalisation is the process to make a database model more efficient. The purpose of normalisation is to have a database structure in the third normal form.

To understand this, lets start with the first, second, then the third normal form.

First Normal Form(1NF)

A table is said to be in first normal form(1NF) if:

  • it has no repeating attribute/field or groups of attributes, and
  • it contains only atomic values.
  1. Take a look at the table below. for each record, it has three attributes, name, courseEnrolled and courseFinishDate. It is not in first normal form since it contains repeating groups of attributes for the two attributes courseEnrolled and courseFinishDate.
    name courseEnrolled courseFinishDate
    John Doe Java, Python 12/12/2015, 1/8/2013
    Mary Jane Perl, Java 29/4/2015, 21/5/2016
  2. To make the above table in first normal form, we need to make courseEnrolled and courseFinishDate contain only one/atomic value. To do this, another table, courses will be needed which should be a different entity anyway:

    course( courseID, courseName)

    courseID courseName
    1 Java
    2 Perl
    3 Python
  3. Then the pupils table can be changed to be in first normal form:
    pupilID name
    1 John Doe
    2 Mary Jane
  4. A third table will be needed to link the pupil and the course table together.

    enrollment(courseID,pupilID,courseLocation)

Second Normal Form(2NF)

A table is said in second normal form if:

  • It is in first normal form, and
  • all non-key attributes are fully functional dependent on the primary key.
  • This partially depdent on the primary key only occurs if there is a composite key.

Take a look at the following table:

pupilID courseID courseLocation finishDate
1 2 Bourne  
1 3 Peterborough  
  1. The above table has a composite key, that is, two attributes pupilID and courseID combined to uniquely identify a record.
  2. The attribute courseLocation is not fully dependent on the composite key (also the primary key). It patially depends on the courseID.
  3. To make the above table in the second normal form, the courseLocation attribute should be in the course table:
    courseID courseName courseLocation
    1 Java Bourne
    2 Perl Stamford
    3 Python Peterborough

Third Normal Form(3NF)

A table is said to be in third normal form if:

  • It is in the second normal form, AND
  • there is no non-key dependencies, or no atrributes depdent on other attrbutes.

All attributes are dependent on the key, the whole key, and nothing but the key.

  • The Goodfilm is the example of tables in third normal form.

To better illustrate this, lets exam a table that is not in third normal form.

Purpose of Database Normalisation

  1. To reduce data redundency
    • This means to avoid the same data being stored as multiple attributes in one table or cross multiple tables. This wastes storage space and easily result on data inconsistency.
    • For example, store customer's name on customer table as well as in orders table.
  2. To maintain data integrity
    • This means to maintain data accuracy and consistency. When data being updated, there is no danger that that piece of data has been updated in one place but not in other places. For example, when updating the bodyMass of person, the isOverWeight attribute may not be updated.
  3. Smaller tables, therefore, faster index, fast sorting, searching, updating.
  4. No accidental deleting a record if that record has a relation with other entities that are still serving business needs. For example, deleting a member while the member is still having a rented movie not returned. A database will prevent this from happening.

3 Database Query Language- SQL

SQL Basics

  • SQL stands for structured query language
  • It is for accessing and manipulating databases
  • It is a standard language (ANSI) and all vendors implement
  • Its commonly used statements include
    • SELECT
    • UPDATE
    • INSERT
    • DELETE
    • WHERE
    • CREATE

DDL Basics

  • DDL stands for data definition language
  • DDL is a standard for for commands that define the different structures in a database.
  • DDL statements create, modify, and remove database objects such as tables, indexes, and users.
  • Common DDL statements are CREATE, ALTER, and DROP.

For the next several SQL related contents, you will need to use an existing database created for you which has two tables to begin with: member and film as for the previous Goodfilm example.

For practice SQL, there are many free online websites. Here are two of them:

Retrieve data from a database

  • To selects all attributes for all records from the table member. The * is a wild card which means all attributes.
SELECT * FROM member;
  • To select only specific attributes for each record.
SELECT memberFirstName, memberLastName FROM member;
  • To select only specific attributes that meets certain condition. The following statement will return member's lastname whose first name is 'Jack'.
SELECT memberLastName FROM member WHERE memberFirstName='Jack';
  • To sort your search results by a specific attributes, for example, sort by memberLastName from the previous query:
SELECT memberLastName FROM member WHERE memberFirstName='Jack' ORDER BY memberLastName;
  • By default the sorting order is ascending, but you can specify otherwise:
SELECT memberLastName FROM member WHERE memberFirstName='Jack' ORDER BY memberLastName DESC;
  • Order your search results by more than one attributes:
SELECT memberLastName, memberJoinDate FROM member WHERE memberFirstName='Jack' 
ORDER BY memberLastName DESC, memberJoinDate ASC;

Update data in a database

  • To update an existing record's attribute(s) that meeting a specified condition:
  UPDATE member SET memberLastName='Cai', FirstName='X' WHERE memberID=1;

It is vital to have the WHERE clause, otherwise you will update all
records in the member table!

Insert data into a database

  • To insert a new record into an existing table:
  INSERT INTO member VALUES(NULL, 'Xiaohui', 'Ellis',NULL); or

  INSERT INTO member (memberFirstName, memberLastName) VALUES ('Jack','Schnauzer');

Both statements work. The second statememnt included the attribute names
and their corresponding values.
Since the memberID is automatically generated by the database, 
you leave it by using the key word "NULL".

Delete data from a database

  • To delete records from a table that meet certain condition(s):
DELETE FROM member WHERE memberJoinDate <'1970';
  • To delete all records from a table if you must:
DELETE FROM member;

Create a database

  • To create a brand new database:
CREATE DATABASE goodfilm;

Alter a database table

  • A database table can be modified to suit ne needs
  • To update a field in an existing database table (for MySQL, other databases may differ):
ALTER TABLE member MODIFY COLUMN memberAddress varchar(200);
  • To add a new field in an existing database table (for MySQL, other databases may differ):
ALTER TABLE member ADD memberMobile varchar(16);
  • To remove a field in an existing database table (for MySQL, other databases may differ):
ALTER TABLE member DROP memberMobile;

Create a database table

  • A database table is defined in its simplest form, by a table name, attribute names, and data types for each attribute. Other properties of a table, such as primary key, foreign key, index, attribute uniqueness, field can or cannot have NULL values, security and access rights can also be defined.
  • To create a brand new database table (for MySQL, other databases may differ):
CREATE TABLE member (
  memberID int(11) NOT NULL AUTO_INCREMENT,
  memberFirstName varchar(20) NOT NULL,
  memberLastName varchar(20) NOT NULL, 
  memberAddress varchar(120),
  PRIMARY KEY (memberID)
  );

The AUTO_INCREMENT property is to specify how the primary key
memberID is to be generated. In this case, it is automatically 
generated and increased by the DBMS.
CREATE TABLE copy (
copyID int(11) NOT NULL AUTO_INCREMENT,
filmID int(11),
purchaseDate date NOT NULL,
PRIMARY KEY (copyID),
FOREIGN KEY (filmID) REFERENCES film(filmID) 
);

4 More Complex SQL

Using the Goodfilm example:

film(filmID, filmName, genre, releaseYear)

copy(copyID, filmID, purchaseDate)

member(memberID, memberName, memberAddress)

rental(rentalID,memberID,copyID, rentalDate)

Here are some of the likely business needs from the database:

  1. Find out all movie copies for a given movie title.
  2. Find out the number of movie copies for a given movie title.
  3. Find out the names of members who have checked out certain movie titles.
  4. Find out the names of members who have checked out certain movies more than 7 days ago.

To fillfil the above queries, a simple query from one table will not be enough since multiple entities/tables are involved.

Query records from two related tables

  • To find out all movie copies for a given movie title, we need to query the table copy and film for a given filmName. Those two tables are linked by a foreign key, filmID in the copy table. The filmID is the copy table and the filmID in the film table should be the same for a given filmName. This kind of query is called inner join.

innerjoin.gif

SELECT * FROM film INNER JOIN copy ON film.fileID = copy.filmID
WHERE film.filmName='The Matrix';
  • An alternative to JOIN:
SELECT * FROM film, copy WHERE film.fileID = copy.filmID AND
film.filmName='The Matrix';
  • To find out the number of movie copies for a given movie title. To do this, we need to use a SQL function called count.
      SELECT COUNT(*) FROM film INNER JOIN copy ON
    film.fileID = copy.filmID WHERE film.filmName='The Matrix';
    

Query records from four related tables

  • To find out the names of members who have checked out certain movie titles, we need to query three tables. This is because, the member table has memberName, the film table has filmName and the rental table has the information on which memberID checked out which copyID.
  • Uisng INNER JOIN
    SELECT member.memberFirstName,film.filmName FROM film
    inner join copy on copy.filmID=film.filmID 
    inner join rental on copy.copyID=rental.copyID 
    inner join member on member.memberID=rental.memberID
    WHERE film.filmName='Alien';
    
  • NOT using INNER JOIN
    SELECT member.memberFirstName,film.filmName FROM film, copy, rental, member
    WHERE copy.filmID=film.filmID 
    AND copy.copyID=rental.copyID 
    AND member.memberID=rental.memberID
    WHERE film.filmName='Alien';
    

5 Client Server Database

Client- Server model of operation on database

Many enterprise database based applications use the client-server model of operation, that is, a centralised database server dealing with many requests from different clients on different PC. An example of this will be a school's student information management system (such as SIMS)- different teachers use a client software on their PC to send requests to the server, and the server responds to each client. Any updates to the server from one client will be available to all clients.

  • The advantages of using a client-server database model:
    • Avoid data inconsistency since only one copy of the database is accessed rather than maintaining multiple copies on individual client.
    • All updates will be immediately available to all clients.
    • Avoid expensive cost of resources required to have multiple copies of database. Those cost can be hardware, software, and administration staff.
    • Access rights and security can be centrally managed.
    • Backup and recovery can be centrally managed.
  • The disadvantages of using a client-server database model:
    • when clients update the same record at the same time(simultaneously),some updates will be lost. Considering the following:
      1. Both client A and B trying to update the same record of member1 at the same time. So they both check out a copy of the record at the same time into their own PC's memory.
      2. Client A makes changes to member 1's address on its local memory
      3. Client B makes changes to member 1's name on its local memory
      4. Client A saves the change to the server and changes the address
      5. Client B saves the change to the server and changes the name
      6. The server now has member 1's name changed but not address!
    • To prevent the above situation of update loss and data inconsistency, a DBMS uses a record locking mechanism - whenever a record is being updated, it is being locked until the transaction is completed or cancelled.

6 Concurrency Control with Record Locking

Record locking can prevent updates being lost and data inconsistency. But considering the following situation:

  1. Client A transferring money from customer 1's balance to customer 2' balance.
  2. Customer 1's record is now locked.
  3. Client B transferring money from customer 2's balance to customer 1's balance.
  4. Customer 2's record is now locked.
  5. A deadlock occurs since bother clients now waiting for a locked record.
    • A deadlock happens when no update is made and all update requests stuck in a waiting state.
    • All activity comes to a halt and remains at a standstill forever unless the DBMS detects the deadlock and aborts one of the transactions.

Dealing with deadlocking using serialisation

Serialisation is a mechanism to ensure there is no transactions overlap in time. A transaction cannot start until the previous one has finished. It is normally implemented using one of the following techniques:

  • timestamp ordering
    • A timestamp is given at the start of each transaction. For two transactions updating the same record, the transaction with the earlier timestamp will be applied first.
    • To avoid updates being lost, each object in the database has a read timestamp and a write timestamp. Those timestamps are updated whenever an object is read or written.
    • When an updated is about to be made, the DBMS checks the read timestamp against the start of the transaction. If the read or write timestamp is newer than the start of the transaction, reject.
  • commitment ordering
    • In this technique, transactions are ordered by their dependencies on each other as well as the time they started.
    • If a transaction has dependecies on another, it is blocked until the dependent-upon transactions have finished/commited.