Relational Model

We'll have one kind of thing, a table!

Relation schema is going to tell us what the table looks like in terms of columns that it has. A relation is going to have the data that's in the table, like the rows. Relation instance is like a particular "snapshot" of a variable. We'll have multiple tables in a database.

Relation schemas and relation instances

I've got a bunch of attributes $A_1$ up to $A_n$ and each attribute $A_i$ has a domain $D_i$ . The schema is just this collection of attributes and their associated domains. It's a a tuple $A_1$ to $A_n$ and we use capital letters as a convention. So capital R below is a relation shcema.

R=\left(A_{1}, A_{2}, \ldots, A_{n}\right)

A particular instance or a relation on this schema which we will often denote with a lowercase letter. A relation instance is a subset of the Cartesian product of the domains of the attributes. So a relation instance $r$ defined over schema $R$ is denoted by $r(R)$ .

r(R) \subseteq D_{1} \times \ldots \times D_{n}

The current values of a relation are specified by a table.
A element t of relation r is called a tuple and is represented by a row in a table.

Example Relation Instance

We could call this a relation instance. We'd often just call it a relation or a table.

\text { Saw }=(userI D, \text { title, year, numStars })

userID

title

year

numStars

12345

Little Women

2019

12345

Little Women

1994

54321

Little Women

2019

54321

Finding Dory

2016

76543

Little Women

2019

Attributes

In the ER model, we had various different kinds of attributes. We had composite attributes, multivalued attributes, derived attributes, and atomic attributes. In the relational model, we only have atomic attributes.

When we translate from ER to relational, how are we going to represent these more complex kinds of attributes?

The set of allowed values for each attribute is called the domain of the attribute. It might for example be 5 digit numbers. It might be strings of length at most 50, etc.
The special value null is a member of every domain. Null is used either to represent that a case where this particular row just does not have a value for that attribute at all, or the case where the value is "unknown".
The null value causes complications in the definition of many operations. It's better to design our schema in such a way that we're less likely to encounter nulls. We won't generally be able to completely avoid them, but we'll we'll try to design things that we won't have too many of them.

Super keys, Candidate keys, and Primary keys

Let's say R is a collection of attributes for a particular relation schema, $R=\left(A_{1}, \ldots, A_{n}\right)$ and K is some subset of that: $\mathrm{K} \subseteq \mathrm{R}$

K is a super key of R if values for K are sufficient to identify a unique tuple of each possible relation r(R). In other words, if we don't have any two rows that share the same value for all of those attributes, that's going to be a super key.
Super key K is a candidate key if K is minimal. If we have more than one candidate key, we'll just pick one and designate that as the primary key.
One of the candidate keys is selected to be the primary key.

Deciding which attribute sets are super keys, cadidate keys depends on some "agreement" about what possible values could occur.

For example, when disignating a primary key constraint, the possible relations on a schema looks like this:

name

DOB

12345

Sally Li

01-01-2000

12345

Bob Smith

01-02-2005

\subseteq \text { Domain }_{ID} \times \text { Domain }_{\text {name }} \times \text { Domain }_{\text {DOB }}

It's a relation on these domains, but not a legal relation on the schema Person(ID, name, DOB) because it has a duplicate primary key.

Database Schemas and Database Instances

We'll have a bunch of tables, each with their own relation schema, and altogether they comprise the database schema. So database schema is a bunch of relation schemas.

Person(ID, firstName, lastName, dateOfBirth)
Saw(ID, title, year, numStars)
Movie(title, year, MPAA)

Referential Integrity Constraints

A referential integrity constraint is a way of saying that a value in a row of one relation must also appear as a value of some attribute in a row of another relation.

Referencing relation
Referenced relation

Common special case is that the referenced key is the primary key of the referenced relation. In this case it's called a Foreign Key Constraint.

Foreign Key constraint example

For example, Saw is a referencing relation and Person is a referenced relation. In this case, any value of ID in the Saw table must also be a value of ID in the Person table.

Person(ID, firstName, lastName, dateOfBirth): referenced schema
Saw(ID, title, year, numStars): referencing schema

Saw.ID should represent ID of a person who's in the Person relation.

Schema Diagrams

In ER diagram we were representing Saw with a diamond because it was a relationship set between people and movies. Here we're representing it as just a table, which we're representing with these rectangles.
We're going to underline the primary keys as we did with entity set.
We'll also have a way to say what foreign key constraints are or referential integrity constraints more generally by putting an arrow from the referencing relationship set to the referenced relationship set. In other words, the head of the arrow pointing toward Person, it's assumed that it's referencing the primary key of the referenced relation.

Last updated 2 years ago