1. Hash table principleHash table is a very important data structure. Almost all programming languages have direct or indirect application of this data structure. It is usually implemented based on arrays. At that time, it had more advantages than arrays:
However, hash tables also have some disadvantages compared to arrays:
So, what exactly is a hash table? Its structure is an array, but the magic lies in a transformation of the subscript value, which we can call a hash function, through which we can get the HashCode. Next, let's look at an example: Use a data structure to store word information, for example, there are 50,000 words, and after finding the word, each word has its own specific application and so on. How should we proceed? Perhaps we could try converting the letters into appropriate subscripts. But how can we convert a character into an array subscript value? Is there a way to convert words into array subscript values? If the word is converted into an array subscript, if we want to find the information of a certain word in the future, we can access the desired element in one step according to the subscript value. So how do you convert a string into a subscript value? In fact, there are many computer coding schemes that use numbers to replace the characters of words, which is character encoding. Of course, we can design our own coding system, such as a is 1, b is 2, c is 3, and so on. But now that we have a coding system, how do we convert a word into a number? Here we have two options: Solution 1: Adding numbers A simple way to convert a word is to sum the encoding of each character of the word For example, the word cats is converted into a number: 3+1+20+19=43, then 43 is stored in the array as the subscript of the word cats. But there is an obvious problem with this solution: many words may end up with the subscript 43. We know that an index value position in an array can only store one data. If later data is stored, it will inevitably cause data overwriting. Therefore, it is obviously unreasonable to store so many words in one index. Solution 2: Multiplication of Powers In fact, the numbers we usually use greater than 10 can be expressed as unique by multiplying powers, for example: 6543 = 6 *10³+5 *10²+4 *10 + 4 Our words can also be represented using this scheme, such as cats = 3 * 27³+1 * 27² + 20 * 27+17 = 60337 The number obtained in this way can basically guarantee its uniqueness and will not be repeated with other words. But there is a problem. If a word is zzzzzzzzzz, then the number obtained exceeds 70000000000000. The array may not be able to represent such a large subscript value. Even if such a large array can be created, there are actually many invalid words and it is meaningless. Summary of the two solutions: The first solution (adding the numbers together) produces too few array indices, and the second solution (multiplying and summing by powers of 27) produces too many array indices. So now we need a compression method to compress the huge integer range obtained in the power multiplication scheme system into an acceptable array range. A simple way is to use the remainder operator, which is used to get the remainder after a number is divided by another number. Implementation of remainder operation: (numbers between 0-199) Suppose we compress numbers from 0-199 (large) to numbers from 0-9 (small) The result of the subscript is index = large % small When a number is divisible by 10, the remainder must be between 0 and 9 For example, 16%10 = 6, 23%10 = 3 There will still be repetitions, but the number of repetitions is obviously smaller. For example, if you take 5 numbers from 0-199 and put them in an array of length 10, there will be repetitions, but the probability of repetition is very small. 2. The concept of hash tableNow that we understand the principles of hashing, we can look at a few concepts: Hashing: The process of converting large numbers into subscripts within the array range is called hashing. Hash function: For example, converting a word into a large number, and putting the code for hashing the large number into a function, this function is the hash function. Hash table: Finally, the data is inserted into this array, and the encapsulation of the entire structure can be called a hash table. 3. Hashing Conflict IssueAs mentioned before, hashed subscript values may still be repeated, which will lead to conflicts. For example, if we select 5 numbers from 0 to 199 and put them in a cell of length 10, and if we randomly select 33, 45, 27, 86, and 92, then their final positions will be 3-5-7-6-2, with no conflict. But what if there is also a 86 and a 66? This is when conflict occurs. How to solve this problem? There are two common solutions: 1. Chain address methodThe chain address method is a common solution to conflict resolution (also known as the zipper method). As shown in the following figure: An array with a memory of 10 is created. Now, some numbers need to be stored in the array. These numbers may be repeated after hashing. The method of linking numbers with the same subscript value through a linked list or array is called the chain address method. When we want to find a value, we can first find the corresponding linked list or array according to its subscript and then search inside it. From the picture, we can see that the way the chain address method resolves conflicts is that each array unit no longer stores a single data but a chain, and the commonly used structure of this chain is an array or a chain. So which method should be used in specific applications? In fact, both methods are acceptable and are similar in efficiency. Of course, in some implementations, the newly inserted data will be placed at the front of the array or linked list. In this case, it is best to use a linked list. Because inserting data at the first position in the array requires all other items to be moved back, while linked lists do not have this problem. 2. Open Address MethodThe open address method basically works by looking for blank cells to add duplicate data. As shown in the following figure: If we have a number 32 and now want to insert it into the array, our solution is:
But there are three different ways to explore this location: 1. Linear Exploration That is, linear search for blank cells Insert 32
Query 32
It should be noted that if the value at position 32 has not been inserted before, it is not necessary to query the entire hash table to determine whether the value exists. Instead, if an empty position is found, it stops. Because it is impossible to skip empty positions before 32 to go to other positions. Linear detection also has a problem: if the previously inserted data is inserted continuously, the newly inserted data will require a long detection distance. 2. Secondary Exploration Quadratic exploration is optimized based on linear exploration. We can regard linear detection as a detection with a step size of 1, for example, starting from the subscript value x, and detecting from x+1, x+2, and x+3 in sequence. The second detection optimizes the step size, for example, starting from the subscript value x, detect x+1², x+2², and x+3² in sequence. However, there are still problems with secondary detection: for example, if we insert 32-112-82-42-52 consecutively, then the step lengths are the same when they are accumulated one by one. In this case, it will cause a clustering of different step lengths, which will still affect efficiency. How to solve this problem? Let’s look at rehashing. 3. Rehashing The re-hashing method is to hash the keyword again with another hash function, and use the hashing result as the step size. For the specified keyword, the step size remains unchanged during the entire detection, and different keywords use different step sizes. The second hashing needs to have the following characteristics: Different from the first hash function The output cannot be 0 (otherwise, there will be no step length, each interjection will be in the same place, and the algorithm will enter an infinite loop) And computer experts have designed a hash function that works very well. Where constant is a prime number that is smaller than the capacity of the array, and key is the value obtained by the first hashing. For example: stepSize = 5-(key%5), which meets the requirement and the result cannot be 0. 4. Implementation of Hash FunctionThe main advantage of hash tables is their speed. One way to increase the speed is to have as few multiplications and divisions as possible in the hash function. A well-designed hash function needs to have the following advantages: (1) Fast calculation (2) Uniform distribution Let's implement it specifically: First of all, the main operation of the hash function we implemented is: converting characters into relatively large numbers and compressing large numbers into the array range. As shown below: function hashFunc(str,size){ //Define hashCode variable var hashCode = 0; //According to Horner's algorithm, calculate the value of hashCode //First convert the string into a digital code for(var i =0;i<str.length;i++){ hashCode = 37*hashCode + str.charCodeAt(i) } //Remainder operation var index = hashCode % size; return index; } Code test: console.log( hashFunc('abc',7)); console.log( hashFunc('cba',7)); console.log( hashFunc('nba',7)); console.log( hashFunc('rgt',7)); The test results are: It can be found that the subscript values corresponding to the strings we obtained are distributed very evenly. 5. Encapsulating Hash TablesHere I will use the chain address method to implement the hash table: Three properties are defined:
The implemented hash table (based on storage array) corresponds to an array (bucket) for each index, and the bucket stores (key and value). The final hash table data format is: [[[k,v],[k,v],[[k,v],[k,v]],[k,v]] As shown in the following figure: The code is as follows: function HashTable(){ // Define properties this.storage = []; this.count = 0; this.limit = 8; } Add the hash function we encapsulated earlier through the prototype: function HashTable(){ // Define properties this.storage = []; this.count = 0; this.limit = 8; HashTable.prototype.hashFunc = function(str,size){ //Define hashCode variable var hashCode = 0; //First convert the string into a digital code for(var i =0;i<str.length;i++){ hashCode = 37*hashCode + str.charCodeAt(i) } //Remainder operation var index = hashCode % size; return index; } } 6. Hash table operations1. Insert & modify operationsThe insertion and modification operations of the hash table are the same function, because when the user passes in a <key, value>, if the key does not exist, then it is an insertion operation, and if the key already exists, then it corresponds to a modification operation. The specific implementation idea is: first get the corresponding hashCode according to the passed key, that is, the index of the array, then take out another array (bucket) from the Index position of the hash table, and check whether the bucket in the previous step is empty. If it is empty, it means that nothing has been placed at this position before, so a new array [] is created; then check whether the value corresponding to the key has been placed before. If it has been placed, then it is a replacement operation instead of inserting new data. If it is not a modification operation, then insert new data and add 1 to the data item. The implementation code is: //Insert and modify operations HashTable.prototype.put = function(key,value){ //Get the corresponding index according to the key var index = this.hashFunc(str,this.limit); //According to the index to get the corresponding bucket var bucket = this.storage[index]; //If the value is empty, assign an array to bucket if(bucket === null){ bucket = []; this.storage[index] = bucket; } //Judge whether to modify data for(let i =0;i<bucket.length;i++){ var tuple = bucket[i]; if (tuple[0] === key) { tuple[1] = value; return; } } //Perform add operation bucket.push([key,value]); this.count += 1; } Test code: ht.put('a',12) ht.put('b',67) ht.put('c',88) ht.put('d',66) console.log('ht',ht); The print result is: Test success 2. Get operationFirst, get the corresponding index according to the key, and then get the corresponding bucket according to the corresponding index; determine whether the bucket is empty. If it is empty, return null. Otherwise, linearly search whether the key in the bucket is equal to the passed key. If they are equal, directly return the corresponding value. Otherwise, directly return null. The implementation code is: HashTable.prototype.get = function(key){ //Get the corresponding index according to the key var index = this.hashFunc(key,this.limit); //Get the corresponding bucket according to the index var bucket = this.storage[index]; //Judge whether it is empty if (bucket == null) { return null; } //Linear search for(let i = 0;i<bucket.length;i++){ var tuple = bucket[i]; if (tuple[0] === key) { return tuple[1]; } } return null; } Test code: For example, return the value of the element with key d console.log("ht.get('d'):"+ht.get('d')); 3. Delete operationThe method is similar to the method of the get operation. First, get the corresponding index according to the key, and then get the corresponding bucket according to the corresponding index; determine whether the bucket is empty. If it is empty, return null. Otherwise, linearly search whether the key in the bucket is equal to the passed key. If they are equal, delete them. Otherwise, return null directly. HashTable.prototype.remove = function(key){ //Get the corresponding index according to the key var index = this.hashFunc(key,this.limit); //Get the corresponding bucket according to the index var bucket = this.storage[index]; //Judge whether it is empty if (bucket == null) { return null; } //Linear search and delete via splice() for(let i =0;i<bucket.length;i++){ var tuple = bucket[i]; if (tuple[0] === key) { bucket.splice(i,1); this.count -= 1; return tuole[1]; } } return null; } Test code: delete the element with key b console.log("ht.remove('b'):"+ht.remove('b')); 4. Determine whether the hash table is emptyHashTable.prototype.isEmpty = function(){ return this.count === 0; } Code test: console.log("Is it empty: "+ht.isEmpty()); result: 5. Get the number of elements in the hash tableHashTable.prototype.size = function(){ return this.count; } Code test: console.log("Number of hash table elements:"+ht.size()); result: 7. Hash table expansion1. Hash table expansion ideasIn the encapsulation of the above code, we put all the data items in an array of length 5. Because we use the chain address method, the ratio of the current amount of data in the hash table to the total length can be greater than 1, and this hash table can insert new data without limit. However, as the amount of data increases, the bucket corresponding to each index will become longer and longer, which will also reduce efficiency. Therefore, it is necessary to expand the array under appropriate circumstances. So how should we expand capacity? Capacity expansion can simply be done by doubling the capacity, but in this case, all data items must be modified at the same time (recalling the hash function and obtaining different locations). Under what circumstances can capacity be expanded? It is more common that the hash table can be expanded when the ratio of the current amount of data to the total length is greater than 0.75. 2. Hash table expansion implementationImplementation idea: First, save the old array content, then reset all attributes, traverse the saved old array content, and determine whether the bucket is empty. If it is empty, skip it. Otherwise, take out the data and reinsert it. The implementation code is: HashTable.prototype.resize = function(newLimit){ //Save the contents of the old array var oldStorge = this.storage; //Reset all properties this.storage = []; this.count = 0; this.limit = newLimit; //Traverse the contents of the old array for(var i =0;i<oldStorge.length;i++){ //Take out the corresponding bucket var bucket = oldStorge[i]; //Judge whether backet is empty if (bucket == null) { continue; } //Take out the data and reinsert it for(var j =0;j<bucket.length;j++){ var tuple = bucket[j]; this.put(tuple[0],tuple[1]); } } } After encapsulation is completed, each time a data item is added, a judgment is made as to whether to expand the capacity. If necessary, expansion is performed. The code is: if(this.count > this.limit*0.75){ this.resize(this.limit*2); } So, there is a corresponding expansion capacity and a corresponding reduction capacity. When we delete data items, if the remaining data items are small, we can reduce the capacity. The code is as follows: if(this.limit > 5 && this.count < this.limit/2){ this.resize(Math.floor(this.limit/2)) } 8. Complete code
function HashTable(){
// Define properties this.storage = [];
this.count = 0;
this.limit = 5;
HashTable.prototype.hashFunc = function(str,size){
//Define hashCode variable var hashCode = 0;
//According to Horner's algorithm, calculate the value of hashCode //First convert the string into a digital code for(var i =0;i<str.length;i++){
hashCode = 37*hashCode + str.charCodeAt(i)
}
//Remainder operation var index = hashCode % size;
return index;
}
//Insert and modify operations HashTable.prototype.put = function(key,value){
//Get the corresponding index according to the key
var index = this.hashFunc(key,this.limit);
//According to the index to get the corresponding bucket
var bucket = this.storage[index];
//If the value is empty, assign an array to bucket if(bucket == null){
bucket = [];
this.storage[index] = bucket;
}
//Judge whether to modify data for(let i =0;i<bucket.length;i++){
var tuple = bucket[i];
if (tuple[0] === key) {
tuple[1] = value;
return;
}
}
//Perform add operation bucket.push([key,value]);
this.count += 1;
//Expand capacity if(this.count > this.limit*0.75){
this.resize(this.limit*2);
}
}
//Get operation HashTable.prototype.get = function(key){
//Get the corresponding index according to the key
var index = this.hashFunc(key,this.limit);
//Get the corresponding bucket according to the index
var bucket = this.storage[index];
//Judge whether it is empty if (bucket == null) {
return null;
}
//Linear search for(let i = 0;i<bucket.length;i++){
var tuple = bucket[i];
if (tuple[0] === key) {
return tuple[1];
}
}
return null;
}
//deletion operation HashTable.prototype.remove = function(key){
//Get the corresponding index according to the key
var index = this.hashFunc(key,this.limit);
//Get the corresponding bucket according to the index
var bucket = this.storage[index];
//Judge whether it is empty if (bucket == null) {
return null;
}
//Linear search and delete via splice() for(let i =0;i<bucket.length;i++){
var tuple = bucket[i];
if (tuple[0] === key) {
bucket.splice(i,1);
this.count -= 1;
return tuple[1];
//Reduce capacity if (this.limit > 5 && this.count < this.limit/2) {
this.resize(Math.floor(this.limit/2))
}
}
}
return null;
}
//Expand HashTable.prototype.resize = function(newLimit){
//Save the contents of the old array var oldStorge = this.storage;
//Reset all properties this.storage = [];
this.count = 0;
this.limit = newLimit;
//Traverse the contents of the old array for(var i =0;i<oldStorge.length;i++){
//Take out the corresponding bucket
var bucket = oldStorge[i];
//Judge whether backet is empty if (bucket == null) {
continue;
}
//Take out the data and reinsert it for(var j =0;j<bucket.length;j++){
var tuple = bucket[j];
this.put(tuple[0],tuple[1]);
}
}
}
HashTable.prototype.isEmpty = function(){
return this.count === 0;
}
HashTable.prototype.size = function(){
return this.count;
}
}
This is the end of this article about the detailed explanation of JavaScript hash table implementation. For more relevant JavaScript hash table content, please search 123WORDPRESS.COM’s previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: 25 Ways and Tips to Increase Web Page Loading Speed
>>: How to implement distributed transactions in MySQL XA
Regarding the nginx panic problem, we first need ...
MySQL DATE_ADD(date,INTERVAL expr type) and ADDDA...
1. What is a transaction? A database transaction ...
Library Management Create a library create databa...
I plan to use C/C++ to implement basic data struc...
<br />When discussing with my friends, I men...
<br />Scientifically Design Your Website: 23...
Table of contents 1.0 Introduction 2.0 Docker Ins...
As the company's influence grows and its prod...
Install Oracle_11g with Docker 1. Pull the oracle...
Take MySQL 5.7.19 installation as an example, fir...
Table of contents Comprehensive comparison From t...
Let’s take a look first. HTML source code: XML/HT...
Table of contents Import on demand: Global Import...
1. CSS element hiding <br />In CSS, there ar...