Entity Relationship Model

Entity Set

A set of objects of the same type
Eample of types of objects:
Person: {p1, p2, p3, ...}
Movie: {m1, m2, m3, ...}

Each element is going to have certain features called attributes. One part of modeling data for a particular purpose is determining which attributes are relevant. For each of entity sets, what attributes we want to represent and grouping all of them together is called the schema for the entity set.

A table below is a schema for the person entity set, essentially a type definition for what a person is. The way that we're characterizing our entities like the name and the date of birth of a person, those are called attributes.

Person

name

DOB

Relationship Set

Definition

Paring between different entity sets or an entity set itself

A relationship set (on entity sets $E_1,...,E_n$ ) is a subset of the Cartesian Product, in other words, it's a set of ordered n-tuples.

R \subseteq E_1 \times E_{2} \times \ldots \times E_{n}

We will mainly consider binary relationship sets (n = 2)

R \subseteq E_{1} \times E_{2}

Cartesian Product:

Relationship set is going to be a particular subset of the Cartesian product. The entire Cartesian product is another relationship set as well.

$\text { Reminder } \mathrm{A} \times \mathrm{~B}=\{(\mathrm{a}, \mathrm{b}) \mid \mathrm{a} \text { belongs to } \mathrm{A} \text { and } \mathrm{b} \text { belongs to } \mathrm{B}\}$

Example of Relationship Set: Saw

Attributes of Relationship Sets

Some data about the relationship between a pair of entities

Those attributes below are not about Sally, and they're not attributes about little women. They're attributes about the relationship between Sally and little women.

Example of Relationship set with Attribute

ER Diagram Notation

We're going to use these rectangular boxes like this to represent our entity sets like the ones for person and movie, and we'll use diamonds to represent the relationship set.

ID, fname, lname, and DOB are attributes of Person
title, year, MPAA, and director are attributes of Movie
Saw is a relationship set on Person and Movie
Person and Movie are participating entity sets for Saw
num_stars is attribute of Saw

We have these solid lines that are highlighted in yellow here between Person and Saw and Movie and Saw to indicate that person and movie are the participating entity sets. We have a dotted line from Saw to num_stars to indicate that num_stars is an attribute of Saw.

Design Issues

What should be an entity set? What should be an attribute? What are the participants in the relationship set?

Sets don't have duplicate elements

Suppose that we have ER diagram and that Sally saw "little women 2019" two different times and one time she gave it five stars and another time she gave it three stars. We cannot have this situation with this data model because we can't have duplicates.

The fact that they have different num_stars isn't relevant because our relationship set, Saw as a set of pairs of Person and and Movie, is not a set because we have two pairs that are the same.

If we want to allow a Person to see a movie multiple times with different numbers of stars, we need to chane the design:

One way we could do that is we could have a different relationship set Saw2 which is a ternary relationship set. So this now has three entity sets, Person, Movie, and we make num_stars into an entity set called Stars instead of being an attribute. Now our elements of Saw2 are triples that consist of a person's name, a movie and a number of stars.

\left(\text{Sally Li, LW-2019, 5)} \in \operatorname{Saw}2\right.\\\left(\text{Sally Li, LW-2019, 3)} \in \operatorname{Saw}2\right.

Now we would not be able to represent here the idea that Sally saw that movie twice and gave it the same rating two different times. So if we wanted to be able to model that then, maybe we want to have an additional attribute or an additional entity set that's representing the date on which a person saw a movie. Actually, in that case we might want to have our entity sets that participate in the ternary or three-way relationship set be the date, the person, and the movie. We have then num_stars be an attribute of when they saw it on that date.

Now, what if they saw it twice the same day? Well, then we might need to do something else. So that's the kind of issue that you have to deal with when you're coming up with the data model.

Distinguishing different kinds of attributes

Four kinds of Attributes in ER model

1. Atomic - simple, "indivisible" values

Atomic attributes could not be broken down further or we don't want to break down further in any logically meaningful way.

For example, an ID, which is just a number. On the other hand, if the ID has some structure to it with further meaning like somebody's ID incorporates into it what state they were born in, or what country they live in, then it shouldn't be atomic if we ever want to be able to kind of pull it apart.

2. Composite - has components

We use indentation in the ER model to indicate composite attributes.

For example, instead of having somebody just have a name, Sally Li, we might decide that we sometimes want to be able to pull out her last name by itself or her first name by itself. So we want the name to be comprised of these parts. SQL and the relational model don't allow composites. And when we translate from the ER model into the relationship model, we're going to have to deal with this disparity between the two modeling techniques.

3. Multi-values - sets of values

We put curly braces to indicate multi-values in ER model.

Suppose that we want to allow a person to have multiple different e-mail addresses. The relational model does not allow multivalued attributes, and that's something else that we'll need to deal with.

4. Derived - computed from other data

By putting these parentheses after value, which is suggestive of it being a function.

We might want to know a person's age, but we might not want to store the age explicitly because it constantly changes. If we know the date of birth and the current date, we can compute sombody's age.

Why even bother putting age into the model? The reason is that it's good to put it there as a kind of a marker that this is something important. About a person. Even though we're not storing it directly.

Note: Richer variety of types here than in Relational Model, makes this more expressive, but leads to some small complications when converting ER to Relational.

Identifying elements of Entity Sets

Super key

A super key of an entity set is a set of attributes such that no two entities (of any reasonable instantiation of the entity set) share the same values for all of those attributes.

Suppose that we have each person has a Name (first name and the last name) and a Date of Birth, and Height. Suppose that we know that we're never going to have two people with the same last name and the same date of birth.

(last name, DOB) is a super key
(first name, last name, DOB) is a super key
(height, last name, DOB) is a super key

Some additional attributes beyond the ones that I'm using to identify the person is a super key.

Candidate key

A candidate key of an entity set is a minimal super key. In other words, a candidate key is a special kind of super key that doesn't have any "extra attributes".

(last name, DOB) is a candidate key
(first name, last name, DOB) is not a candidate key
(height, last name, DOB) is not a candidate key

Primary key

In ER diagram, we will underline the all the attributes (composite primary key) that comprise the primary key.

For each entity set we choose a candidate key we'll use to identify elements and designate it as the primary key. In other words, from among all of my candidate keys, we are going to choose one to use as this primary key.

For example, assume that no two people in any data set we'll ever consider will have

The same first name and the same last name
The same last name and the same DOB

In that case, we could use the first name and the last name as a candidate key as well, and now we would choose between those two as the primary key.

What if there isn't a candidate key?

What if we can't make assumption like no two people in any data set we'll ever consider will have

The same first name and the same last name
The same last name and the same DOB

But, I'd like to still have a primary key. One possibility is to add another attribute whose sole purpose is to uniquely identify entities. For example, I could add a Social Security number or add a student ID, that's guaranteed to uniquely identify them.

But the downside of that is that I'm going to need some extra space for that and need some centralized mechanism then to make sure that everybody has a unique ID.

Primary key of Relationship Set

The primary key of a relationship set is the collection of primary keys of the participating entity sets.

Since relationship set, Saw is set, I can't have multiple pairings between the same person and movie pair. Therefore, there's only going to be one line connecting p2 to m1.

Relationship sets between an entity set and itself

Let's say I want to talk about movies being sequels of other movies. I could have a sequel of relationships as two participating entity sets that are two copies of movies.

I have one copy of movie, but I have two different lines connecting it to the relationship sets and it's often useful to label those lines saying that one line is a sequel of and one is a sequel to. Those are called rolls.

Cardinality Constraints

Many to Many

In general, our relationship sets are what are known as many to many relationship sets, meaning that our person in movie example, a person may watch many movies and a movie may be watched by many person.

There are some relationship sets where I might want to call out that a person can only be associated with one element of another entity set, and so that gives rise to what are called cardinality constraints.

The star means no limit or Infinity in this context. This would be if I have a many to many relationship set between E1 and E2, then every entity in E1 is associated with at least zero elements of E2, and there's no upper bound on how many they're associated with, and same goes for E1 for E2.

Many to One

Suppose instead of just who saw which movie, I want to keep track of each person's favorite movie and each person can have at most one favorite movie. Each a movie may the favorite of many people, but each person has just one favorite movie, or at most one favorite movie.

This situation where a person has at most one favorite movie, we're going to denote with an arrow pointing toward movie.

Another notation for this is that the edge between person and movie is to put the minimum and maximum number of movies that a person could be have as their favorite. Lower and upper bounds on this edge. 0 is a minimum number of favorite movies a Person has and 1 is a maximum number of favorite movies a Person has.

One to Many

Suppose that we want to model the director of a movie and that we specify that each movie has just one director.

Participation Constraints

The question of whether every element from a particular set is associated with somebody in the other set or not?

Partial pariticipation

The default is that there might be elements of an entity set that aren't associated with anything.
An element x of E1 may be paired with no elements of E2 in R

Total participation

From one side of the relationship sets means that everybody in that entity set is associated with something.
For every element x of E1 there is at least one y in E2, in other words, everybody in E1 is paired with somebody in E2, such that (x, y) belongs to R
Denoted by a double line from E1 to R in the ER diagram of by 1.. on the edge from E1 to R

Example

Suppose we were in a classroom and we have a relationship set Sitting In between Student and Seat.

Every student has a seat is indicated by marked in yellow. Here the double line from Student to Sitting in.
Each student has at most one seat is indicated with the arrow pointing towards Seat.
Yellow double line from Student to Seat and the green arrow pointing towards Seat together, that says that every student has exactly one seat.
We hopefully don't have two people sitting in the same seat, so we would also have an arrow going from Seat to Student marked in orange. Saying that for every seat there's at most one student. But we don't have total participation because we might have seats that are empty. So we don't have a double line from seat to student.

Important and common special case

Many-to-one and total from entity set E1 to entity set E2. In that case it's a function from E1 to E2. When we convert to the relational model we'll be able to optimize a little bit.

R: E 1 \longrightarrow E_{2}

Last updated 2 years ago