Detailed explanation of JavaScript implementation of hash table

Detailed explanation of JavaScript implementation of hash table

1. Hash table principle

Hash table is a very important data structure. Almost all programming languages ​​have direct or indirect application of this data structure. It is usually implemented based on arrays. At that time, it had more advantages than arrays:

  • It provides very fast insert-delete-find operations.
  • The hash table is faster than numbers, and you can find the desired element almost instantly.
  • Hash tables are much easier to encode than numbers.

However, hash tables also have some disadvantages compared to arrays:

  • The array in the hash table is unordered, so the elements cannot be traversed in a fixed way (such as from small to large).
  • Normally, duplicate keys are not allowed in a hash table, and the same key cannot be placed in a hash table to store different elements.

So, what exactly is a hash table?

Its structure is an array, but the magic lies in a transformation of the subscript value, which we can call a hash function, through which we can get the HashCode.

Next, let's look at an example:

Use a data structure to store word information, for example, there are 50,000 words, and after finding the word, each word has its own specific application and so on. How should we proceed?

Perhaps we could try converting the letters into appropriate subscripts. But how can we convert a character into an array subscript value? Is there a way to convert words into array subscript values? If the word is converted into an array subscript, if we want to find the information of a certain word in the future, we can access the desired element in one step according to the subscript value. So how do you convert a string into a subscript value?

In fact, there are many computer coding schemes that use numbers to replace the characters of words, which is character encoding. Of course, we can design our own coding system, such as a is 1, b is 2, c is 3, and so on. But now that we have a coding system, how do we convert a word into a number?

Here we have two options:

Solution 1: Adding numbers

A simple way to convert a word is to sum the encoding of each character of the word

For example, the word cats is converted into a number: 3+1+20+19=43, then 43 is stored in the array as the subscript of the word cats.

But there is an obvious problem with this solution: many words may end up with the subscript 43.

We know that an index value position in an array can only store one data. If later data is stored, it will inevitably cause data overwriting. Therefore, it is obviously unreasonable to store so many words in one index.

Solution 2: Multiplication of Powers

In fact, the numbers we usually use greater than 10 can be expressed as unique by multiplying powers, for example: 6543 = 6 *10³+5 *10²+4 *10 + 4

Our words can also be represented using this scheme, such as cats = 3 * 27³+1 * 27² + 20 * 27+17 = 60337

The number obtained in this way can basically guarantee its uniqueness and will not be repeated with other words.

But there is a problem. If a word is zzzzzzzzzz, then the number obtained exceeds 70000000000000. The array may not be able to represent such a large subscript value. Even if such a large array can be created, there are actually many invalid words and it is meaningless.

Summary of the two solutions:

The first solution (adding the numbers together) produces too few array indices, and the second solution (multiplying and summing by powers of 27) produces too many array indices.

So now we need a compression method to compress the huge integer range obtained in the power multiplication scheme system into an acceptable array range. A simple way is to use the remainder operator, which is used to get the remainder after a number is divided by another number.

Implementation of remainder operation: (numbers between 0-199)

Suppose we compress numbers from 0-199 (large) to numbers from 0-9 (small)

The result of the subscript is index = large % small

When a number is divisible by 10, the remainder must be between 0 and 9

For example, 16%10 = 6, 23%10 = 3

There will still be repetitions, but the number of repetitions is obviously smaller. For example, if you take 5 numbers from 0-199 and put them in an array of length 10, there will be repetitions, but the probability of repetition is very small.

2. The concept of hash table

Now that we understand the principles of hashing, we can look at a few concepts:

Hashing: The process of converting large numbers into subscripts within the array range is called hashing.

Hash function: For example, converting a word into a large number, and putting the code for hashing the large number into a function, this function is the hash function.

Hash table: Finally, the data is inserted into this array, and the encapsulation of the entire structure can be called a hash table.

3. Hashing Conflict Issue

As mentioned before, hashed subscript values ​​may still be repeated, which will lead to conflicts. For example, if we select 5 numbers from 0 to 199 and put them in a cell of length 10, and if we randomly select 33, 45, 27, 86, and 92, then their final positions will be 3-5-7-6-2, with no conflict. But what if there is also a 86 and a 66? This is when conflict occurs. How to solve this problem?

There are two common solutions:

1. Chain address method

The chain address method is a common solution to conflict resolution (also known as the zipper method).

As shown in the following figure:

An array with a memory of 10 is created. Now, some numbers need to be stored in the array. These numbers may be repeated after hashing. The method of linking numbers with the same subscript value through a linked list or array is called the chain address method. When we want to find a value, we can first find the corresponding linked list or array according to its subscript and then search inside it.

From the picture, we can see that the way the chain address method resolves conflicts is that each array unit no longer stores a single data but a chain, and the commonly used structure of this chain is an array or a chain. So which method should be used in specific applications? In fact, both methods are acceptable and are similar in efficiency. Of course, in some implementations, the newly inserted data will be placed at the front of the array or linked list. In this case, it is best to use a linked list. Because inserting data at the first position in the array requires all other items to be moved back, while linked lists do not have this problem.

2. Open Address Method

The open address method basically works by looking for blank cells to add duplicate data. As shown in the following figure:

If we have a number 32 and now want to insert it into the array, our solution is:

  • The newly inserted 32 should have been inserted at position 52, but that position already contained data.
  • It can be found that there is no content at positions 3, 5, and 9.
  • At this time, you can find the corresponding blank position to put this data

But there are three different ways to explore this location:

1. Linear Exploration

That is, linear search for blank cells

Insert 32

  • The hashed index is 2, but when inserting, it is found that there is already 52 at that position.
  • At this point, start from index+1 and look for a suitable location to place 32
  • The first empty position detected is where the value is inserted.

Query 32

  • First, we hash the result to get index = 2, and compare the position result of 2 with the query value to see if they are the same. If they are the same, we return it directly.
  • If they are not the same, search linearly, starting from index position + 1 and looking for the same one as 32.

It should be noted that if the value at position 32 has not been inserted before, it is not necessary to query the entire hash table to determine whether the value exists. Instead, if an empty position is found, it stops. Because it is impossible to skip empty positions before 32 to go to other positions.

Linear detection also has a problem: if the previously inserted data is inserted continuously, the newly inserted data will require a long detection distance.

2. Secondary Exploration

Quadratic exploration is optimized based on linear exploration.

We can regard linear detection as a detection with a step size of 1, for example, starting from the subscript value x, and detecting from x+1, x+2, and x+3 in sequence.

The second detection optimizes the step size, for example, starting from the subscript value x, detect x+1², x+2², and x+3² in sequence.

However, there are still problems with secondary detection: for example, if we insert 32-112-82-42-52 consecutively, then the step lengths are the same when they are accumulated one by one. In this case, it will cause a clustering of different step lengths, which will still affect efficiency. How to solve this problem? Let’s look at rehashing.

3. Rehashing

The re-hashing method is to hash the keyword again with another hash function, and use the hashing result as the step size. For the specified keyword, the step size remains unchanged during the entire detection, and different keywords use different step sizes.

The second hashing needs to have the following characteristics:

Different from the first hash function

The output cannot be 0 (otherwise, there will be no step length, each interjection will be in the same place, and the algorithm will enter an infinite loop)

And computer experts have designed a hash function that works very well.

stepSize = constant - (key % constant)

Where constant is a prime number that is smaller than the capacity of the array, and key is the value obtained by the first hashing.

For example: stepSize = 5-(key%5), which meets the requirement and the result cannot be 0.

4. Implementation of Hash Function

The main advantage of hash tables is their speed. One way to increase the speed is to have as few multiplications and divisions as possible in the hash function. A well-designed hash function needs to have the following advantages:

(1) Fast calculation

(2) Uniform distribution

Let's implement it specifically:

First of all, the main operation of the hash function we implemented is: converting characters into relatively large numbers and compressing large numbers into the array range. As shown below:

 function hashFunc(str,size){
            //Define hashCode variable var hashCode = 0;
            //According to Horner's algorithm, calculate the value of hashCode //First convert the string into a digital code for(var i =0;i<str.length;i++){
                hashCode = 37*hashCode + str.charCodeAt(i)
            }
            //Remainder operation var index = hashCode % size;
            return index;
        }

Code test:

 console.log( hashFunc('abc',7));
       console.log( hashFunc('cba',7));
       console.log( hashFunc('nba',7));
       console.log( hashFunc('rgt',7));

The test results are:

It can be found that the subscript values ​​corresponding to the strings we obtained are distributed very evenly.

5. Encapsulating Hash Tables

Here I will use the chain address method to implement the hash table:

Three properties are defined:

  • storage: as an array, storing related elements
  • count: records the amount of data currently stored
  • -limit: How much data can be stored in the marker array

The implemented hash table (based on storage array) corresponds to an array (bucket) for each index, and the bucket stores (key and value). The final hash table data format is: [[[k,v],[k,v],[[k,v],[k,v]],[k,v]]

As shown in the following figure:

The code is as follows:

function HashTable(){
     // Define properties this.storage = [];
     this.count = 0;
     this.limit = 8;
 }

Add the hash function we encapsulated earlier through the prototype:

function HashTable(){
   // Define properties this.storage = [];
    this.count = 0;
    this.limit = 8;

 HashTable.prototype.hashFunc = function(str,size){
      //Define hashCode variable var hashCode = 0;
       //First convert the string into a digital code for(var i =0;i<str.length;i++){
           hashCode = 37*hashCode + str.charCodeAt(i)
       }
       //Remainder operation var index = hashCode % size;
       return index;
   }
}

6. Hash table operations

1. Insert & modify operations

The insertion and modification operations of the hash table are the same function, because when the user passes in a <key, value>, if the key does not exist, then it is an insertion operation, and if the key already exists, then it corresponds to a modification operation.

The specific implementation idea is: first get the corresponding hashCode according to the passed key, that is, the index of the array, then take out another array (bucket) from the Index position of the hash table, and check whether the bucket in the previous step is empty. If it is empty, it means that nothing has been placed at this position before, so a new array [] is created; then check whether the value corresponding to the key has been placed before. If it has been placed, then it is a replacement operation instead of inserting new data. If it is not a modification operation, then insert new data and add 1 to the data item.

The implementation code is:

 //Insert and modify operations HashTable.prototype.put = function(key,value){
            //Get the corresponding index according to the key
            var index = this.hashFunc(str,this.limit);
            //According to the index to get the corresponding bucket
            var bucket = this.storage[index];
            //If the value is empty, assign an array to bucket if(bucket === null){
                bucket = [];
                this.storage[index] = bucket;
            }
            //Judge whether to modify data for(let i =0;i<bucket.length;i++){
                var tuple = bucket[i];
                if (tuple[0] === key) {
                    tuple[1] = value;
                    return;
                }
            }
            //Perform add operation bucket.push([key,value]);
            this.count += 1;
        }

Test code:

 ht.put('a',12)
 ht.put('b',67)
 ht.put('c',88)
 ht.put('d',66)
 console.log('ht',ht);

The print result is:

Test success

2. Get operation

First, get the corresponding index according to the key, and then get the corresponding bucket according to the corresponding index; determine whether the bucket is empty. If it is empty, return null. Otherwise, linearly search whether the key in the bucket is equal to the passed key. If they are equal, directly return the corresponding value. Otherwise, directly return null.

The implementation code is:

HashTable.prototype.get = function(key){
    //Get the corresponding index according to the key
    var index = this.hashFunc(key,this.limit);
    //Get the corresponding bucket according to the index
    var bucket = this.storage[index];
    //Judge whether it is empty if (bucket == null) {
        return null;
    }
    //Linear search for(let i = 0;i<bucket.length;i++){
        var tuple = bucket[i];
        if (tuple[0] === key) {
            return tuple[1];
        }
    }
    return null;
}

Test code: For example, return the value of the element with key d

 console.log("ht.get('d'):"+ht.get('d'));

3. Delete operation

The method is similar to the method of the get operation. First, get the corresponding index according to the key, and then get the corresponding bucket according to the corresponding index; determine whether the bucket is empty. If it is empty, return null. Otherwise, linearly search whether the key in the bucket is equal to the passed key. If they are equal, delete them. Otherwise, return null directly.

 HashTable.prototype.remove = function(key){
             //Get the corresponding index according to the key
            var index = this.hashFunc(key,this.limit);
             //Get the corresponding bucket according to the index
            var bucket = this.storage[index];
            //Judge whether it is empty if (bucket == null) {
                return null;
            }
            //Linear search and delete via splice() for(let i =0;i<bucket.length;i++){
                var tuple = bucket[i];
                if (tuple[0] === key) {
                    bucket.splice(i,1);
                    this.count -= 1;
                    return tuole[1];
                }
            }
            return null;
        }

Test code: delete the element with key b

 console.log("ht.remove('b'):"+ht.remove('b'));

4. Determine whether the hash table is empty

HashTable.prototype.isEmpty = function(){
            return this.count === 0;
        }

Code test:

 console.log("Is it empty: "+ht.isEmpty());

result:

5. Get the number of elements in the hash table

HashTable.prototype.size = function(){
            return this.count;
        }

Code test:

console.log("Number of hash table elements:"+ht.size());

result:

7. Hash table expansion

1. Hash table expansion ideas

In the encapsulation of the above code, we put all the data items in an array of length 5. Because we use the chain address method, the ratio of the current amount of data in the hash table to the total length can be greater than 1, and this hash table can insert new data without limit. However, as the amount of data increases, the bucket corresponding to each index will become longer and longer, which will also reduce efficiency. Therefore, it is necessary to expand the array under appropriate circumstances. So how should we expand capacity? Capacity expansion can simply be done by doubling the capacity, but in this case, all data items must be modified at the same time (recalling the hash function and obtaining different locations). Under what circumstances can capacity be expanded? It is more common that the hash table can be expanded when the ratio of the current amount of data to the total length is greater than 0.75.

2. Hash table expansion implementation

Implementation idea: First, save the old array content, then reset all attributes, traverse the saved old array content, and determine whether the bucket is empty. If it is empty, skip it. Otherwise, take out the data and reinsert it. The implementation code is:

HashTable.prototype.resize = function(newLimit){
            //Save the contents of the old array var oldStorge = this.storage;
            //Reset all properties this.storage = [];
            this.count = 0;
            this.limit = newLimit;
            //Traverse the contents of the old array for(var i =0;i<oldStorge.length;i++){
                //Take out the corresponding bucket
                var bucket = oldStorge[i];
                //Judge whether backet is empty if (bucket == null) {
                    continue;
                }
                //Take out the data and reinsert it for(var j =0;j<bucket.length;j++){
                    var tuple = bucket[j];
                    this.put(tuple[0],tuple[1]);
                }
            }
        }

After encapsulation is completed, each time a data item is added, a judgment is made as to whether to expand the capacity. If necessary, expansion is performed. The code is:

if(this.count > this.limit*0.75){
   this.resize(this.limit*2);
     }

So, there is a corresponding expansion capacity and a corresponding reduction capacity. When we delete data items, if the remaining data items are small, we can reduce the capacity. The code is as follows:

if(this.limit > 5 && this.count < this.limit/2){
    this.resize(Math.floor(this.limit/2))
     }

8. Complete code 

  function HashTable(){
   // Define properties this.storage = [];
       this.count = 0;
       this.limit = 5;

       HashTable.prototype.hashFunc = function(str,size){
       //Define hashCode variable var hashCode = 0;
       //According to Horner's algorithm, calculate the value of hashCode //First convert the string into a digital code for(var i =0;i<str.length;i++){
           hashCode = 37*hashCode + str.charCodeAt(i)
       }
       //Remainder operation var index = hashCode % size;
       return index;
   }
   //Insert and modify operations HashTable.prototype.put = function(key,value){
       //Get the corresponding index according to the key
       var index = this.hashFunc(key,this.limit);
       //According to the index to get the corresponding bucket
       var bucket = this.storage[index];
       //If the value is empty, assign an array to bucket if(bucket == null){
           bucket = [];
           this.storage[index] = bucket;
       }
       //Judge whether to modify data for(let i =0;i<bucket.length;i++){
           var tuple = bucket[i];
           if (tuple[0] === key) {
               tuple[1] = value;
               return;
           }
       }
       //Perform add operation bucket.push([key,value]);
       this.count += 1;
       //Expand capacity if(this.count > this.limit*0.75){
           this.resize(this.limit*2);
       }
   }

   //Get operation HashTable.prototype.get = function(key){
       //Get the corresponding index according to the key
       var index = this.hashFunc(key,this.limit);
       //Get the corresponding bucket according to the index
       var bucket = this.storage[index];
       //Judge whether it is empty if (bucket == null) {
           return null;
       }
       //Linear search for(let i = 0;i<bucket.length;i++){
           var tuple = bucket[i];
           if (tuple[0] === key) {
               return tuple[1];
           }
       }
       return null;
   }
   //deletion operation HashTable.prototype.remove = function(key){
        //Get the corresponding index according to the key
       var index = this.hashFunc(key,this.limit);
        //Get the corresponding bucket according to the index
       var bucket = this.storage[index];
       //Judge whether it is empty if (bucket == null) {
           return null;
       }
       //Linear search and delete via splice() for(let i =0;i<bucket.length;i++){
           var tuple = bucket[i];
           if (tuple[0] === key) {
               bucket.splice(i,1);
               this.count -= 1;
               return tuple[1];

               //Reduce capacity if (this.limit > 5 && this.count < this.limit/2) {
                   this.resize(Math.floor(this.limit/2))
               }
           }
       }
       return null;
   }
   //Expand HashTable.prototype.resize = function(newLimit){
       //Save the contents of the old array var oldStorge = this.storage;
       //Reset all properties this.storage = [];
       this.count = 0;
       this.limit = newLimit;
       //Traverse the contents of the old array for(var i =0;i<oldStorge.length;i++){
           //Take out the corresponding bucket
           var bucket = oldStorge[i];
           //Judge whether backet is empty if (bucket == null) {
               continue;
           }
           //Take out the data and reinsert it for(var j =0;j<bucket.length;j++){
               var tuple = bucket[j];
               this.put(tuple[0],tuple[1]);
           }
       }
   }
   HashTable.prototype.isEmpty = function(){
       return this.count === 0;
   }
   HashTable.prototype.size = function(){
       return this.count;
   }
}
 

This is the end of this article about the detailed explanation of JavaScript hash table implementation. For more relevant JavaScript hash table content, please search 123WORDPRESS.COM’s previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • JS simulation implementation of hash table and detailed application
  • Example analysis of implementing HashTable in js
  • Example of dictionary and hash table structure for key-value correspondence in JavaScript
  • Summary of several uses of hash tables in js
  • Simple implementation of javascript hash table

<<:  25 Ways and Tips to Increase Web Page Loading Speed

>>:  How to implement distributed transactions in MySQL XA

Recommend

Detailed explanation of the solution to the nginx panic problem

Regarding the nginx panic problem, we first need ...

MySQL DATE_ADD and ADDDATE functions add a specified time interval to a date

MySQL DATE_ADD(date,INTERVAL expr type) and ADDDA...

Analysis of Mysql transaction characteristics and level principles

1. What is a transaction? A database transaction ...

Summary of common Mysql DDL operations

Library Management Create a library create databa...

VSCode+CMake+Clang+GCC environment construction tutorial under win10

I plan to use C/C++ to implement basic data struc...

Web design experience: Make the navigation system thin

<br />When discussing with my friends, I men...

Provides helpful suggestions for improving website design

<br />Scientifically Design Your Website: 23...

Two-hour introductory Docker tutorial

Table of contents 1.0 Introduction 2.0 Docker Ins...

Summary of web designers' experience and skills in learning web design

As the company's influence grows and its prod...

How to install Oracle_11g using Docker

Install Oracle_11g with Docker 1. Pull the oracle...

Vue code highlighting plug-in comprehensive comparison and evaluation

Table of contents Comprehensive comparison From t...

Html makes a simple and beautiful login page

Let’s take a look first. HTML source code: XML/HT...

How to implement on-demand import and global import in element-plus

Table of contents Import on demand: Global Import...

CSS element hiding principle and display:none and visibility:hidden

1. CSS element hiding <br />In CSS, there ar...