Database index knowledge points summary

Database index knowledge points summary

First Look Index

The concept of index

An index is an auxiliary storage structure defined on the basis of a storage table that helps to quickly locate required records without checking all records. It consists of a series of index items stored on the disk, and each index item consists of two parts. That is, the index field and the row pointer.

Index fields

It is formed by concatenating the values ​​in some columns of a table, usually one column. An index typically stores every value of the indexed field.

Row pointer

Points to the location on disk where the record in the table containing the indexed field value is stored.

The file that stores index items is called the index file, and the storage table is called the main file.

Index file organization

(In contrast, the main file organization includes heap files, sorted files, hash files, cluster files, etc.)

Sorted index file: Organize and store in a certain order according to the index field values

Hash index file: Stores data in a hash bucket using a hash function based on the index field value.

The role of indexes

Different index files are created for different attributes or attribute combinations in a table. The index field value can be the value of any attribute or a combination of attribute values ​​in the table.

The index file is much smaller than the main file. By searching a small index file (which can be completely loaded into memory), you can quickly locate the relevant records in the very large main file and then read them in a targeted manner.

When there is an index, update operations must update both the index file and the main file simultaneously. Maintain its data consistency.

Index creation and maintenance in SQL language

Basics

After defining a table, if a primary key is defined, the system automatically generates a primary index;

Indexes can be defined or revoked by the user;

When an index is created, whether it is a primary index or a user-defined index, the DBMS will automatically maintain all indexes;

When a table is deleted, all indexes defined on the table are automatically dropped.

Create and drop indexes

CREATE INDEX idxSname ON Student(Sname);
DROP INDEX idxSname;

Dense and sparse indexes

Dense Index

For each record in the master file (each index field value formed), there is an index entry corresponding to it, indicating the location of the record. Such an index is called a dense index. (dense index)

Sparse Index

For some records in the main file (forming the index field values), there are index items corresponding to them. Such indexes are called non-dense indexes or sparse indexes.

How sparse indexes locate records

To locate the record whose index field value is K, you need

First, find the index item corresponding to the largest index field value that is smaller than K; start searching the table sequentially from the record corresponding to the index item.

Requirements for using sparse indexes : The main file must be stored in the order of the corresponding index field attributes.

Compared to dense indexes : takes up less space, requires less maintenance, but is slower

Balance : Index entries do not point to record pointers, but to the storage blocks where the records are located. That is, there is one index entry per storage block, not one index entry per record - primary index

How dense indexes locate records

1. Dense indexes of candidate key attributes can be matched one by one

2. For dense indexes of non-candidate key attributes, the main file is sorted by index field value, and an index item is created for each non-repeated first index field value. The same index field value is searched nearby;

3. For dense indexes of non-candidate key attributes, the primary file is not sorted by index field values. The index fields in the index items are not required to be unique and can appear repeatedly to point to the corresponding index field values ​​in the primary file.

4. For dense indexes of non-candidate key attributes, the main file is not sorted according to the index field value. If the index field in the index is required to be unique, an intermediate layer, the pointer bucket, can be introduced; the pointer bucket is the third case.

Primary Index

Primary Index Concept

Usually there is an index entry for each storage block. The total number of index entries is the same as the number of storage blocks occupied by the storage table. The first record of each storage block in the storage table is also called an anchor record, or block anchor for short.

The index field value of the primary index is the index field value of the block anchor, and the pointer points to the storage block where it is located.

The primary index is an ordered file sorted by the index field value. It is usually established on the primary key-based sort field of the ordered main file, that is, the index field of the primary index corresponds to the sort code (primary key) of the main file.

The primary index is a sparse index.

Auxiliary index

Auxiliary index definition

An auxiliary storage structure defined on any one or more non-sorted fields of the primary file; usually there is an index entry for different values ​​of a non-sorted field, the index field is the different values ​​of the field, and the pointer points to the block containing the record or the record itself;

When the non-sorted field is an index field, if the field value is not unique, a structure similar to a linked list is used to save the positions of all records of the field value;

Auxiliary indexes are dense indexes, and the retrieval speed is sometimes quite high

The difference and connection between primary index and auxiliary index

A primary file has only one primary index, but can have multiple secondary indexes;

The primary index is usually built on the primary key or sort code, while the secondary index is built on the non-sort field.

The primary index can be used to reorganize the primary file data, but the secondary index cannot change the primary file data;

The primary index is a sparse index and the secondary index is a dense index.

Clustered and non-clustered indexes

Clustered index

This means that adjacent records in the index are also stored adjacently in the main file;

Nonclustered index

This means that adjacent records in the index are not necessarily stored adjacently in the main file.

Notice:

If a sort field in the primary file is not the primary key, the value of each record in this field is not unique. In this case, the field is called a clustered field. A clustered index is usually defined on a clustered field. A clustered index usually has an index item for each different value in the clustered field (the total number of index items is the same as the number of different values ​​in the clustered field in the primary file). The index field is the different values ​​of the clustered field. Since records with the same clustered field value may be stored in several blocks, the pointer of the index item points to the first block. A primary file can have only one clustered index file, but can have multiple non-clustered index files. The primary index is usually a clustered index (but the total number of index entries is not necessarily the same as the number of distinct values ​​on the clustered field in the primary file; it is the same as the number of primary file storage blocks); the secondary index is usually a nonclustered index. A primary index/clustered index is an index that determines where records are stored, whereas a non-clustered index can only be used for queries to indicate where records are stored.

Inverted Index

The inverted index is a specific storage form that implements the "word-document matrix". Through the inverted index, you can quickly obtain a list of documents containing a word based on the word. The inverted index mainly consists of two parts: "word dictionary" and "inverted file".

Lexicon: The usual index unit of a search engine is a word. A lexicon is a set of strings consisting of all the words that have appeared in a document collection. Each index entry in the lexicon records some information about the word itself and a pointer to an "inverted list".

Posting List: The posting list records the document list of all documents where a certain word appears and the position information of the word in the document. Each record is called a posting item. According to the inverted list, we can know which documents contain a certain word.

Inverted File: The inverted list of all words is often stored sequentially in a file on the disk. This file is called an inverted file. An inverted file is a physical file that stores the inverted index.

Multi-level indexing

When there are many index items, you can create another index on the index, which is called a multi-level index.

Common multi-level indexes: B-tree/B+ tree indexes

Multi-attribute index

The index field is an index formed by combining multiple attribute values ​​of the table.

Hash index

Indexes organized using hashing techniques

Grid Index

Use multiple index fields for cross-joint positioning and retrieval

B+Tree Index

definition

A multi-level index that organizes index items in a tree data structure

Since a storage block can store multiple index items, each index item consists of two parts: a pointer and an index field. Ki represents the index field value, and Pi represents a pointer, which points to an index block or a data block or a record in a data block.

A block can usually store n-1 index items and 1 pointer.

B+ Tree Features

  • Automatically maintains tree hierarchy to fit the main file size
  • The pointer utilization of each index block is between 50% and 100%.

The index field value x is pointed to by Pi when Ki-1<=x<Ki, and is pointed to by Pi+1 when Ki<=x<Ki+1.

What do leaf nodes and leaf node pointers point to respectively?

Non-leaf node pointers point to index blocks, and leaf node pointers point to data blocks or data records of the main file.

The last pointer of the leaf node points to the next data block

The number of index pointers actually used by an index block is d, which satisfies (except the root node)

n/2<=d<=n

At least 2 pointers of the root node are used

B+ tree storage conventions

Index field values ​​appear repeatedly in leaf nodes and non-leaf nodes

Pointers to the main file appear only in leaf nodes

All leaf nodes can cover the index of all key values

Index field values ​​are arranged in order in leaf nodes

The set of leaf nodes alone is the complete index of the main file.

This is the end of this article about the summary of database index knowledge points. For more relevant database index content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • A summary of the knowledge points of database indexing. Everything you need to know is here.
  • Detailed introduction to the creation and use of indexes in Oracle database
  • MySQL database optimization: index implementation principle and usage analysis
  • Mysql database advanced usage of views, transactions, indexes, self-connections, user management example analysis
  • How to customize the order in which Django models create database indexes

<<:  Class in front-end JavaScript

>>:  Analysis of basic usage of ul and li

Recommend

Solution to forgetting the root password of MySQL 5.7 and 8.0 database

Note: To crack the root password in MySQL5.7, you...

MySQL 8 new features: Invisible Indexes

background Indexes are a double-edged sword. Whil...

Detailed examples of variable and function promotion in JavaScript

js execution Lexical analysis phase: includes thr...

Detailed explanation of Bind mounts for Docker data storage

Before reading this article, I hope you have a pr...

CSS3 realizes the glowing border effect

Operation effect: html <!-- This element is no...

MySQL efficient query left join and group by (plus index)

mysql efficient query MySQL sacrifices group by t...

How to check the hard disk size and mount the hard disk in Linux

There are two types of hard disks in Linux: mount...

MySQL implementation of lastInfdexOf function example

Sometimes MySQL needs to use a function similar t...

Detailed explanation of pure SQL statement method based on JPQL

JPQL stands for Java Persistence Query Language. ...

Problems and solutions when installing MySQL8.0.13 on Win10 system

Operating system: Window10 MySQL version: 8.0.13-...

Mysql join table and id auto-increment example analysis

How to write join If you use left join, is the ta...

Create a virtual environment using venv in python3 in Ubuntu

1. Virtual environment follows the project, creat...

How to implement interception of URI in nginx location

illustrate: Root and alias in location The root d...