Dense Index vs. Sparse Index
Dense Index Files
In dense index, index record appears for every search-key value in the file.
Example: index on ID attribute of instructor relation
Suppose we're indexing them by their IDs. Our index is going to have IDs and pointers, whereas the data file is going to have the IDs as well as all the other attributes - the last the name, the department, the salaries, etc.

Suppose that only two of these records fit on each block. We would have block one having our first two records, Block 2 has Mozart and Einstein, Block 3 has El Said and Gold, etc. But our index entries are smaller, so we can fit more of them onto a block, so a single block is going to be the frist six IDs for example.
If I was looking for 83821, instead of searching through from Block 1 to Block 6, I'll have two index blocks to look at which will take me to the block where that person Brant has their record.
However, using an index has its trade-offs. While it can speed up data retrieval, it might also introduce additional seeks. For example, you might first seek to the index to find out where the data is and then seek again to the actual data location. In some cases, especially when data is read sequentially, it might be faster to read the data without using an index.
So one thing that will help is, if I could have sparsity instead, that's going to make my index a little bit smaller.
Example: multiple records that have the same value for the attribute

If multiple records have the same value for an indexed attribute, then you can achieve savings in the size of the index. Instead of having a separate index entry for each record, you can have a single index entry pointing to the first occurrence of a particular value, and then scan sequentially from there to find all records with that value. This reduces the size of the index compared to the size of the data file.
Notice that this is a "dense primary index". A primary index is an index on a set of fields that includes the primary key. Since the data is sorted by the department name, and the index is also on the department name, it's a dense primary index.
In summary, the main point is about the efficiency of dense indexes, especially when multiple records share the same value for an indexed attribute. In such cases, the index can be much smaller than the data file, leading to savings in storage and potentially faster query performance.
Spare Index Files
Spare index contains index records for only some search-key values.
Applicable when records are sequentially ordered on search-key
To locate a record with search-key value K we:
Find index record with largest search-key valye < K
Search file sequentially starting at the record to which the index record points
Example
Suppose that I'm searching for 22222. There isn't an entry for 22222. It's greater than equal to 10101 and less than 32343. I'll follow the record ID pointer that goes with the largest index entry that's less than or equal to the ID that I'm searching for - 10101.
Within the data block, records are sorted. So if I just start looking sequentially from 10101, I will eventually get to what I'm looking for, 22222. The block that I fetched to search for 22222 would be one block: from 10101 to 22222

Once the next block (from 32343 to 76543) is in memory, I look through it quickly because it's in memory, but I don't care about what sorting algorithms to use like a binary search or a sequential search since once it's in memory, it's very fast compared to the I/O.
Last updated