This article will show you how JavaScript garbage collection works

This article will show you how JavaScript garbage collection works

1. Overview

With the continuous development of the software development industry, performance optimization has become an inevitable topic. So what kind of behavior can be regarded as performance optimization?

Essentially, any behavior that can improve operational efficiency and reduce operational overhead can be considered an optimization operation.

This means that there must be many areas worth optimizing in the software open industry, especially in the front-end development process, where performance optimization can be considered ubiquitous. For example, the network used when requesting resources, the data transmission method, or the framework used in the development process can all be optimized.

This chapter explores the optimization of the JavaScript language itself, from the use of cognitive memory space to garbage collection, so that efficient JavaScript code can be written.

2. Memory Management

With the continuous development of hardware technology in recent years, high-level programming languages ​​have built-in GC mechanisms, allowing developers to complete corresponding function development normally without paying special attention to the use of memory space. Why do we need to mention memory management again? Let's use a very simple code to explain it.

First, define a normal function fn, then declare an array in the function body, and then assign values ​​to the array. It should be noted that a relatively large number is deliberately chosen as the subscript when assigning values. The purpose of doing this is to allow the current function to apply for as much space as possible in the memory when it is called.

function fn() {
    arrlist = [];
    arrlist[100000] = 'this is a lg';
}

fn()

There is no syntactic problem in the process of executing this function. However, when monitoring the memory with the corresponding performance monitoring tool, it is found that the memory change continues to increase linearly and there is no drop in the process. This indicates a memory leak. If you don't understand the memory management mechanism well enough when writing code, you will write some memory problem code that is not easy to detect.

When there is too much of this kind of code, the program may bring some unexpected bugs, so it is very necessary to master memory management. So let's take a look at what memory management is.

From the word itself, memory is actually composed of readable and writable units, which represents an operational space. What management deliberately emphasizes here is that people should take the initiative to apply for, use and release this space. Even with the help of some APIs, people can do this independently after all. Therefore, memory management means that developers can actively apply for memory space, use space, and release space. Therefore, the process is very simple, with three steps in total: application, use and release.

Back to JavaScript, in fact, like other languages, JavaScript also executes this process in three steps, but ECMAScript does not provide a corresponding operation API. Therefore, JavaScript cannot be like C or C++, where developers can actively call the corresponding API to manage memory space.

However, even so, it does not affect our demonstration of how the life cycle of a space is completed through JavaScript scripts. The process is very simple. First, you need to apply for space. Second, you need to use the space. Third, you need to release the space.

There is no corresponding API directly provided in JavaScript, so the JavaScript execution engine can only automatically allocate a corresponding space when it encounters a variable definition statement. Here we first define a variable obj and then point it to an empty object. Its use is actually a read-write operation. Just write a specific data into this object, such as writing a yd. Finally, you can release it. Again, there is no corresponding release API in JavaScript, so an indirect method can be used here, such as directly setting it to null.

let obj = {}

obj.name = 'yd'

obj = null

At this time, it is equivalent to implementing memory management in JavaScript according to a memory management process. Later, you can just look at the memory trend in such performance monitoring tools.

3. Garbage Collection

First of all, what kind of content is considered garbage in JavaScript. The concept of garbage will also exist in subsequent GC algorithms, and the two are actually exactly the same. So let’s explain it uniformly here.

Memory management in JavaScript is automatic. Every time an object, array or function is created, the corresponding memory space is automatically allocated. When the subsequent program code is executed and some objects can no longer be found through some reference relationships, these objects will be regarded as garbage. Or these objects actually already exist, but due to some inappropriate syntax or structural errors in the code, there is no way to find these objects, then these objects will also be called garbage.

After discovering garbage, the JavaScript execution engine will come out to work and recycle the object space occupied by the garbage. This process is called garbage collection. Several small concepts are used here. The first is reference, and the second is access from the root. This operation will also be frequently mentioned in subsequent GC.

Here I would like to mention another term called reachable object. First of all, it is very easy to understand reachable object in JavaScript, which means the object that can be accessed. As for access, it can be through a specific reference or through the scope chain in the current context. As long as it can be found, it is considered reachable. However, there is a small standard restriction here, which is that it is considered reachable only if it can be found from the root. So let's discuss what the root is. In JavaScript, the current global variable object can be considered the root, which is the so-called global execution context.

To sum it up simply, garbage collection in JavaScript is actually finding the garbage and then letting the JavaScript execution engine release and recycle the space.

References and reachable objects are used here. Next, we will try to use code to see how references and reachable objects are reflected in JavaScript.

First, define a variable. In order to modify the value later, use the let keyword to define an obj to point to an object. For the convenience of description, give it a name called

let obj = {name: 'xiaoming'}

After writing this line of code, it is actually equivalent to this space being referenced by the current obj object, and a reference appears here. In the global execution context, obj can be found from the root, which means that obj is reachable, which indirectly means that the current xiaoming's object space is reachable.

Then redefine a variable, for example, let ali be equal to obj. It can be considered that Xiao Ming's space has one more reference. There is a change in the reference value here, and this concept will be used in the subsequent reference counting algorithm.

let obj = {name: 'xiaoming'}

let ali = obj

Let's do one more thing, find obj directly and reassign it to null. After you have done this, you can think about it. Xiao Ming's object space itself has two references. With the execution of the null assignment code, the reference from obj to Xiaoming's space is cut off. Is Xiao Ming's object still reachable now? Of course it is. Because ali is still referencing such an object space, it is still a reachable object.

This is the main explanation of a quote, and we also see a reachable one by the way.

Next, let's take another example to illustrate the reachable operations in current JavaScript, but this needs to be explained in advance.

In order to facilitate the mark-sweep algorithm in the subsequent GC, this example will be a little more complicated.

First, define a function named objGroup, set two parameters obj1 and obj2, let obj1 point to obj2 through an attribute, and then let obj2 also point to obj1 through an attribute. Then use the return keyword to directly return an object, return obj1 through o1, and then set o2 to let it find obj2. After completion, call this function externally, set a variable to receive it, and obj is equal to the result of the objGroup call. The two parameters passed are two objects obj1 and obj2.

function objGroup(obj1, obj2) {
    obj1.next = obj2;
    obj2.prev = obj1;
}

let obj = objGroup({name: 'obj1'}, {name: 'obj2'});

console.log(obj);

Run it and you will find that you have obtained an object. The object contains obj1 and obj2, and obj1 and obj2 point to each other through an attribute.

{
    o1: {name: 'obj1', next: {name: 'obj2', prev: [Circular]}},
    o2: {name: 'obj2', next: {name: 'obj1', next: [Circular]}}
}

Analyzing the code, we can first find a reachable object obj from the global root. It points to a memory space through a function call, which contains o1 and o2 as seen above. Then inside o1 and o2, the corresponding attributes just point to an obj1 space and an obj2 space. obj1 and obj2 reference each other through next and prev, so all objects that appear in the code can be searched from the root. No matter how difficult it is to find, you can find it anyway, so continue with some more analysis.

If the delete statement is used to directly delete the reference of o1 in obj and the reference of obj2 to obj1. At this point it means that there is no way to directly find the obj1 object space, so it will be considered a garbage operation here. Finally, the JavaScript engine will find it and recycle it.

It is a bit complicated to explain here. To put it simply, there will be some object reference relationships when writing code. You can search from the bottom of the root, and you will eventually find some objects according to the reference relationship. However, if the paths to these objects are found to be destroyed or recycled, then there is no way to find them again at this time, and they will be regarded as garbage, and finally the garbage collection mechanism can recycle them.

4. Introduction to GC Algorithm

GC can be understood as an abbreviation of the garbage collection mechanism. When GC is working, it can find some garbage objects in the memory, and then release the space and recycle it, so that subsequent code can continue to use this part of the memory space. As for what kind of things can be considered garbage in GC, here are two small standards.

The first one is to consider from the perspective of program requirements. If a certain data is no longer needed in the context after use, it can be treated as garbage.

For example, name in the following code is no longer needed after the function call is completed. Therefore, from the demand perspective, it should be recycled as garbage. As to whether it has been recycled or not, we will not discuss it now.

function func() {
    name = 'yd';
    return `${name} is a coder`
}

func()

The second situation is to consider whether the variable can be referenced during the current program execution. For example, the code below still places a name inside the function, but this time a keyword for declaring the variable is added. With this keyword, when the function call ends, the name can no longer be accessed in the external space. So when you can't find him, he can actually be considered as garbage.

function func() {
    const name = 'yd';
    return `${name} is a coder`
}

func()

After talking about GC, let’s talk about GC algorithm. We already know that GC is actually a mechanism. The garbage collector in it can complete specific recycling work, and the essence of the work is to find garbage to release space and recycle space. There will be several actions in this process: finding space, releasing space, and reclaiming space. There must be different ways in such a series of processes. The GC algorithm can be understood as some rules followed by the garbage collector in the working process, like some mathematical calculation formulas.

Common GC algorithms include reference counting, which can use a number to determine whether the current object is garbage. Mark sweep can add marks to active objects when GC is working to determine whether it is garbage. Mark-sweep is very similar to mark-sweep, except that some different things can be done during the subsequent recycling process. Generational recycling, the recycling mechanism used in V8.

5. Reference counting algorithm

The core idea of ​​the reference counting algorithm is to maintain the number of references to the current object internally through a reference counter, so as to determine whether the reference value of the object is 0 to determine whether it is a garbage object. When this value is 0, GC starts working and recycles and releases the object space where it is located.

The existence of reference counters may result in differences in execution efficiency between reference counting and other GC algorithms.

A change in the reference value means that when the reference relationship of an object changes, the reference counter will actively modify the reference value corresponding to the current object. For example, if there is an object space in the code and a variable name points to it, the value is +1 at this time. If there is another object pointing to it, then it is +1 again. If it decreases, it is -1. When the reference number is 0, GC will work immediately to recycle the current object space.

Let's use simple code to illustrate the situation where the reference relationship changes. First, define a few simple user variables, treat them as ordinary objects, and then define an array variable to store the age attribute values ​​of several objects in the array. Define another function and define several variable values ​​num1 and num2 in the function body. Note that there is no const here. Calling a function at the outer level.

const user1 = {age: 11};
const user2 = {age: 22};
const user3 = {age: 33};

const nameList = [user1.age, user2.age, user3.age,];

function fn() {
    num1 = 1;
    num2 = 2;
}

fn();

First of all, from a global perspective, you will find that user1, user2, user3 and nameList can be found directly under the window. At the same time, num1 and num2 defined in the fn function are also mounted under the window object because no keywords are set. At this time, the reference counts of these variables are definitely not 0.

Then, directly declare num1 and num2 with keywords in the function, which means that the current num1 and num2 can only take effect within the scope. Therefore, once the function call is completed, num1 and num2 cannot be found from the external global location, and the reference counts on num1 and num2 will return to 0. At this moment, as long as it is 0, GC will start working immediately and recycle num1 and num2 as garbage. That is to say, after the function is executed, the internal memory space will be recycled.

const user1 = {age: 11};
const user2 = {age: 22};
const user3 = {age: 33};

const nameList = [user1.age, user2.age, user3.age,];

function fn() {
    const num1 = 1;
    const num2 = 2;
}

fn();

Then let's take a look at others such as user1, user2, user3 and nameList. Since userList just points to the above three object spaces, even after the script is executed once, the spaces in user1, user2, and user3 are still referenced. Therefore, the reference counters at this time are not 0, and they will not be recycled as garbage. This is the basic principle followed in the implementation of the reference counting algorithm. The simple summary is to rely on the value of the reference count of the current object to determine whether it is 0, thereby determining whether it is a garbage object.

1. Advantages and disadvantages of reference counting

The advantages of the reference counting algorithm can be summarized into two parts.

The first is that the reference counting rule will recycle garbage immediately when it is found, because it can determine whether the object is garbage based on whether the current reference count is 0. If so, it can be released immediately.

The second is that the reference counting algorithm can minimize program pauses. During the execution of the application, memory will inevitably be consumed. The memory of the current execution platform must have an upper limit, so the memory will definitely be full. Since the reference counting algorithm always monitors the objects with memory reference values ​​of 0, in an extreme case, when it finds that the memory is about to be full, the reference counting will immediately find the object space with a value of 0 and release it. This ensures that the current memory will never be full, which is the so-called reduction of program pauses.

There are also two explanations for the disadvantages of reference counting.

The first is that the reference counting algorithm has no way to reclaim the space of objects with circular references. The following code snippet demonstrates what a circular reference object is.

Define a normal function fn and define two variables inside the function body, objects obj1 and obj2. Let obj1 have a name attribute pointing to obj2, and let obj2 have an attribute pointing to obj1. At the end of the function, return returns a normal character. Of course, this has no practical significance and is just a test. Then call the function at the outermost level.

function fn() {
    const obj1 = {};
    const obj2 = {};

    obj1.name = obj2;
    obj2.name = obj1;

    return 'yd is a coder';
}

Then the next analysis is still the same. After the function is executed, the space inside it must be involved in space recycling. For example, obj1 and obj2 are no longer pointed to globally, so their reference counts should be 0 at this time.

But there will be a problem at this time. You will find that when GC wants to delete obj1, it will find that obj2 has an attribute pointing to obj1. In other words, although according to the previous rules, obj1 and obj2 cannot be found in the global scope, there is obviously a mutual guidance relationship between them within the scope. In this case, the reference counter values ​​on them are not 0, and GC has no way to reclaim these two spaces. This also causes a waste of memory space, which is the so-called circular reference between objects. This is also a problem faced by the reference counting algorithm.

The second problem is that the reference counting algorithm consumes more time, because the current reference count needs to maintain a change in value. In this case, it is necessary to constantly monitor whether the reference value of the current object needs to be modified. Modifying the value of an object takes time. If there are more objects in the memory that need to be modified, the time will be very long. Therefore, compared with other GC algorithms, the time overhead of the reference counting algorithm is greater.

6. Mark-and-Sweep Algorithm

Compared with reference counting, the principle of the mark-and-sweep algorithm is simpler and can also solve some corresponding problems. It is used extensively in V8.

The core idea of ​​the mark-and-sweep algorithm is to divide the entire garbage collection operation into two stages. The first stage traverses all objects and then finds active objects to mark. The activity is the same as the reachable objects mentioned earlier. In the second stage, all objects will still be traversed and unmarked objects will be cleared. It should be noted that the mark set in the first stage will also be erased in the second stage to facilitate the GC to work normally next time. In this way, the current garbage space can be reclaimed through two traversal behaviors, and finally handed over to the corresponding free list for maintenance, and subsequent program code can be used.

This is the basic principle of the mark-and-sweep algorithm. In fact, it consists of two operations: the first is marking and the second is clearing. Here is an example.

First, declare three reachable objects A, B, and C globally. After finding these three reachable objects, you will find that there are some sub-references under them. This is where the mark-sweep algorithm is powerful. If it is found that there are children under it, and even children under the children, it will continue to search for reachable objects in a recursive way. For example, D and E are child references of A and C respectively, and will also be marked as reachable.

There are two variables a1 and b1 here. They are in the local scope of the function. After the local scope is executed, the space is recycled. Therefore, a1 and b1 cannot be found under the global chain. At this time, the GC mechanism will think that it is a garbage object and will not mark it. Eventually, it will be recycled when GC works.

const A = {};

function fn1() {
    const D = 1;
    AD = D;
}

fn1();

const B;

const C = {};

function fn2() {
    const E = 2;
    AE = E;
}

fn2();

function fn3() {
    const a1 = 3;
    const b1 = 4;
}

fn3();

This is what the mark-and-sweep process calls the marking phase and the sweeping phase, and what each phase does. Simple tidying up can be divided into two steps. In the first stage, all reachable objects are found. If a hierarchy of references is involved, the search is performed recursively. After the search is completed, these reachable objects will be marked. After the marking is completed, the second stage begins to clean up, find those objects that are not marked, and clear the marks made in the first stage. This completes a garbage collection. At the same time, please note that the recovered space will eventually be placed directly on a free list. To facilitate subsequent procedures, you can directly apply for space here.

1. Advantages and disadvantages of the mark-sweep algorithm

Compared with reference counting, mark-sweeping has one major advantage, which is that it can solve the recycling operation of object circular references. When writing code, you may define globally reachable objects such as A, B, and C. There may also be some local scopes of functions, such as defining a1 and b1 within a function and allowing them to reference each other.

const A = {};

const B;

const C = {};

function fn() {
    const a1 = {};
    const b1 = {};
    a1.value = b1;
    b1.value = a1;
}

fn();

After the function call ends, the space inside it must be released. In this case, once a function call ends, the variables in its local space lose their link to the global scope. At this time, a1 and b1 cannot be accessed under the global root and are unreachable objects. Unreachable objects cannot be marked during the marking phase and are directly released during the second phase of recycling.

This is what mark-sweep can do, but in reference counting, the function call ends and there is no way to access it globally. However, since the current judgment criterion is whether the reference number is 0, in this case, there is no way to release the a1 and b1 spaces. This is the biggest advantage of the mark-and-sweep algorithm, of course, relative to the reference counting algorithm.

At the same time, the mark-and-sweep algorithm also has some disadvantages. For example, simulating a memory storage situation, searching from the root, there is a reachable object A at the bottom, and there are areas B and C on the left and right sides that cannot be directly searched from the root. In this case, during the second round of clearing operations, the space corresponding to B and C will be directly reclaimed. The released space is then added to the free list, and subsequent programs can directly apply for a corresponding space address from the free list for use. There is a problem in this situation.

function fn() {
    const B = 'two';
}
fn();

const A = 'four characters';

function fn2() {
    const C = 'a';
}
fn2();

For example, we believe that any space will consist of two parts, one is used to store some metadata of the space such as its size and address, which is called the header. There is also a part specifically used to store data called a domain. The B and C spaces assume that the B object has space for 2 words and the C object has space for 1 word. In this case, although it is recycled, it seems that the space of 3 words is released in total, but they are divided by the A object. Therefore, after the release is completed, they are still scattered, that is, the addresses are not continuous.

This is very important. The address size of the space you want to apply for later is exactly 1.5 words. In this case, if you directly look for the space released by B, you will find that there is too much, because there are 0.5 more. If you directly look for the space released by C, you will find that there is not enough, because there is only 1. So this brings up the biggest problem in the mark-and-sweep algorithm, space fragmentation.

The so-called space fragmentation is that the addresses of the garbage objects currently being recycled are not continuous. Due to this discontinuity, they are scattered in various corners after recycling. When you want to use them later, if the newly generated space happens to match their size, you can use them directly. Once there is too much or too little, it is not suitable for use.

These are the advantages and disadvantages of the mark-and-sweep algorithm. To put it simply, the advantage is that it can solve the problem of circular references that cannot be recycled, and the disadvantage is that it will cause space fragmentation and cannot maximize the use of space.

7. Mark-Sweep Algorithm

The mark-collation algorithm is frequently used in V8. Let's take a look at how it is implemented.

First of all, we think that the mark-sweep algorithm is an enhanced operation of mark-sweep. They are exactly the same in the first stage. They will traverse all objects and then mark the reachable active objects. In the second stage of clearing, mark sweeping directly reclaims the space of unmarked garbage objects, while mark compacting performs compaction operations before clearing, moving the positions of objects so that they can be continuous in address.

Assuming that there are many active objects and inactive objects, as well as some free space before recycling, when the marking operation is performed, all active objects will be marked, followed by the sorting operation. Sorting is actually a change in position. The active objects will be moved first to make them continuous in address. Then the range to the right of the active object will be recycled as a whole, which has obvious benefits compared to the mark-sweep algorithm.

Because there will not be a large number of scattered small spaces in the memory, the reclaimed space is basically continuous. This will allow you to maximize the use of the freed space in subsequent use. This process is the mark-compact algorithm, which will cooperate with mark-sweep to implement frequent GC operations in the V8 engine.

8. Execution timing

The first is the reference count, which can recycle garbage objects in time. As long as the value is 0, GC will immediately find this space for recycling and release. It is precisely because of this feature that reference counting can minimize program jams, because as long as the space is about to be full, the garbage collector will work to release the memory so that there is always some available space in the memory space.

Mark sweep cannot recycle garbage objects immediately, and the current program actually stops working when it goes to clear. Even if garbage is found in the first stage, it will not be recycled until it is cleared in the second stage.

Mark compaction also cannot reclaim garbage objects immediately.

9. V8 Engine

As we all know, the V8 engine is the most mainstream JavaScript execution engine on the market. The Chrome browser and NodeJavaScript platform used in daily life use this engine to execute JavaScript code. For these two platforms, the reason why JavaScript can run efficiently is precisely because of the existence of V8. The reason why V8 is fast is that in addition to having an excellent memory management mechanism, another feature is that it uses just-in-time compilation.

Previously, many JavaScript engines needed to convert source code into bytecode before execution, but V8 can translate source code into machine code that can be executed directly. So the execution speed is very fast.

Another major feature of V8 is that its memory has an upper limit. Under a 64-bit operating system, the upper limit does not exceed 1.5G, and under a 32-bit operating system, the value does not exceed 800M.

Why does V8 adopt this approach? The reasons can basically be explained from two aspects.

First, V8 itself is made for browsers, so the existing memory size is sufficient for use. In addition, the garbage collection mechanism implemented in V8 also determines that it is very reasonable to adopt such a setting. Because the official has done a test, when the garbage memory reaches 1.5G, V8 only needs 50ms to use the incremental marking algorithm for garbage collection, and it takes 1s to collect garbage using the non-incremental marking method. From the perspective of user experience, 1s is already a very long time, so 1.5G is used as the boundary.

1. Garbage Collection Strategy

A lot of data will be used in the process of using the program, and the data can be divided into primitive data and object type data. The basic raw data is controlled by the program language itself. Therefore, the recycling mentioned here mainly refers to the object data that survives in the heap area, so this process is inseparable from memory operations.

V8 adopts the idea of ​​generational recycling, dividing the memory space into two categories according to certain rules, the new generation storage area and the old generation storage area. After classification, the most efficient GC algorithm will be used for different generations to recycle different objects. This means that V8 recycling will use many GC algorithms.

First of all, the generational recovery algorithm must be used because it must be divided into generations. The space replication algorithm will be used next. In addition, mark-sweep and mark-compact are also used. Finally, in order to improve efficiency, mark increment is used again.

2. Recycling new generation objects

The first thing to do is to explain the memory allocation within V8. Because it is based on the idea of ​​generational garbage collection, the memory space is divided into two parts inside V8. It can be understood that a storage area is divided into two areas on the left and right. The space on the left is used to store new generation objects, and the space on the right is used to store old generation objects. The new generation object space has certain settings. The size is 32M in a 64-bit operating system and 16M in a 32-bit operating system.

The new generation of objects actually refers to those with a shorter survival time. For example, there is a local scope in the current code, and the variables in the scope will be recycled after the execution is completed. There is also a variable in other places, such as the global scope, and the global variable will definitely not be recycled until the program exits. So relatively speaking, the new generation refers to those variable objects with a shorter survival time.

The algorithms used for recycling new generation objects are mainly copy algorithm and mark-compact algorithm. First, the small space on the left is divided into two parts, called From and To, and the sizes of these two parts are equal. The From space is called the used state, and the To space is called the idle state. With these two spaces, when the code is executed, if space needs to be applied, all variable objects will be allocated to the From space first. That is to say, To is idle during this process. Once the From space is used to a certain extent, the GC operation will be triggered. At this time, mark and sort will be used to mark the From space, find the active objects, and then use sorting operations to make their positions continuous so that fragmented space will not be generated later.

After completing these operations, the active object is copied to the To space, which means that the active object in the From space has a backup, and recycling can be considered at this time. Recycling is also very simple. You only need to release the From space completely. This process completes the recycling operation of the new generation of objects.

To sum up, the storage area of ​​the new generation of objects is divided into two equal-sized spaces. These two spaces are named From and To. Currently, From is used, and all object declarations will be placed in this space. When the GC mechanism is triggered, all active objects will be found, sorted, and copied to the To space. After the copy is completed, we let From and To exchange space (that is, exchange names), the original To becomes From, and the original From becomes To. This completes the release and recovery of space.

The following describes the details of the process. The first thing that comes to mind in this process is that if the space pointed to by a variable object is found during copying, it will also appear in the current old generation object. At this time, an operation called promotion will occur, which is to move the objects of the new generation to the old generation for storage.

As for when to trigger the promotion operation, there are generally two criteria. The first is if some objects in the new generation are still alive after a round of GC. At this time, it can be copied to the old generation storage area for storage. In addition, if the usage rate of the To space exceeds 25% during the current copy process, all active objects need to be moved to the old generation for storage.

Why choose 25%? In fact, it is easy to understand, because when the recycling operation is performed in the future, the From space and the To space will eventually be exchanged. That is to say, the previous To will become From, and the previous From will become To. This means that if the usage rate of To reaches 80%, it will eventually become the storage space for active objects, and new objects seem to be unable to be stored in it. The simple explanation is that if the usage rate of the To space exceeds a certain limit, when it becomes used in the future, the space for new objects may not be sufficient, so there will be such a limit.

To sum it up simply, the current memory is divided into two parts, one part is used to store the new generation objects. As for what the new generation objects are, it can be assumed that their survival time is relatively short. Then you can use the mark-and-snap algorithm to mark and sort the active objects in the From space, and then copy them to the To space. Finally, swap the states of the two spaces, and the space release operation is completed.

3. Recycling old generation objects

The old generation objects are stored on the right side of the memory space. In V8, there is also a memory size limit, which is 1.4G in a 64-bit operating system and 700M in a 32-bit operating system.

Old generation objects refer to objects that have a longer survival time, such as some variables stored in the global object mentioned earlier, or variables placed in some closures may also survive for a long time. The three main algorithms used for garbage collection in the old generation are mark-sweep, mark-compact and incremental mark.

The mark-sweep algorithm is mainly used to release and recycle garbage space. The mark-sweep algorithm mainly finds all active objects in the old generation storage area, marks them, and then directly releases the garbage data space. Obviously, there will be some space fragmentation problems in this place. However, despite this problem, the underlying layer of V8 still mainly uses the mark-sweep algorithm. Because compared to space debris, its improvement speed is very obvious.

Under what circumstances will the mark-sweep algorithm be used? When the content in the new generation needs to be moved to the old generation, and the space in the old generation storage area at this time point is not enough to store the objects moved from the new generation storage area. In this case, mark compaction will be triggered to reclaim some of the previous lock space, so that the program has more space to use. Finally, incremental marking will be used to improve the efficiency of recycling.

Here we compare the garbage collection of the new and old generations.

The new generation of garbage collection is more like exchanging space for time, because it uses a copy algorithm, which means that there is always a free space inside it. However, since the space of the new generation storage area itself is very small, the space allocated is even smaller. The improvement in time brought about by this part of space waste is of course negligible.

Why not adopt this one-to-two approach in the recycling process of old generation objects? Because the storage space of the old generation is relatively large, if it is divided into two, hundreds of megabytes of space will be wasted, which is too extravagant. The second is that there are more object data stored in the old generation storage area, so the time consumed in the assignment process is very long. Therefore, the garbage collection of the old generation is not suitable to be implemented using the copy algorithm.

As for the incremental marking algorithm mentioned earlier, how does it optimize garbage collection operations? First, it is divided into two parts, one is program execution and the other is garbage collection.

First of all, it is clear that when garbage collection is working, it will block the execution of the current JavaScript program, that is, there will be a gap period, for example, after the program execution is completed, it will stop to perform garbage collection operations. The so-called mark-increment operation simply means splitting the entire garbage collection operation into multiple small steps, and completing the entire collection in groups, replacing the garbage collection operation that was previously completed in one go.

The main benefit of doing this is that garbage collection and program execution can be completed alternately, which will result in a more reasonable time consumption. This avoids the situation in the past where garbage collection cannot be done while the program is executing, and the program cannot continue to run when garbage collection is being done.

Let's take a simple example to illustrate the implementation principle of incremental marking.

When the program is first run, garbage collection is not required. Once garbage collection is triggered, no matter what algorithm is used, traversal and marking operations will be performed. This is for the old generation storage area, so there is a traversal operation. Marking is required during the traversal process. As mentioned before, marking does not have to be done in one go because there are directly reachable and indirectly reachable operations. That is to say, when doing it, the first step is to find the reachable objects on the first layer. Then you can stop and let the program execute for a while. If the program has been executed for a while, and the GC machine continues to perform the second step of marking, for example, if there are some child elements that are also reachable, then continue to mark them. After a round of marking, the GC is stopped and the program execution continues, that is, marking and program execution are performed alternately.

After the final marking operation is completed, garbage collection will be completed. During this period, the program will stop and will not continue to execute until the garbage collection operation is completed. Although it seems that the program pauses many times, the largest garbage collection of the entire V8 is when the memory reaches 1.5G, and the garbage collection time using non-incremental marking does not exceed 1s, so the interruption of the program here is reasonable. Moreover, this will split the previously long pause time into smaller segments to the maximum extent, making the user experience more streamlined.

4. V8 Garbage Collection Summary

First of all, you should know that the V8 engine is the current mainstream JavaScript execution engine. There is an upper limit on the internal memory of V8. The reason for this is that first, it is set for browsers, so this memory size is sufficient for use in web applications. The second is determined by its internal garbage collection mechanism. If the memory is set larger, the recycling time may exceed the user's perception, so an upper limit value is set here.

V8 adopts the idea of ​​generational recycling, dividing the memory into the new generation and the old generation. The new generation and the old generation are different in terms of space and storage data type. The new generation has 32M space under a 64-bit operating system and 16M space under a 32-bit system.

V8 uses different GC algorithms for different generations of objects to complete garbage collection operations. Specifically, the new generation uses the copy algorithm and the mark-compact algorithm, and the old generation objects mainly use the mark-sweep, mark-compact and incremental mark algorithms.

10. Introduction to Performance Tools

The purpose of GC is to allow the memory space to be used in a benign cycle during the running of the program. The basis of the so-called virtuous cycle is actually to require developers to reasonably allocate memory space when writing code. However, since ECMAScript does not provide programmers with the corresponding API for operating memory space, it seems that it is not known whether it is reasonable, because it is all done automatically by GC.

If you want to determine whether the memory usage of the entire process is reasonable, you must find a way to keep an eye on the changes in memory at all times. Therefore, there is such a tool that can provide developers with more monitoring methods and help developers monitor the memory space during program running.

By using Performance, you can monitor the changes in memory during program execution in real time. In this way, when there is a problem with the program's memory, you can directly find a way to locate the problematic code block. Let's take a look at the basic steps for using the Performance tool.

First open the browser and enter the URL in the address bar. It is not recommended to access the address immediately after entering it, because you want to record the initial rendering process, so just open the interface and enter the URL. Next, open the Developer Tools panel (F12) and select Performance. Turn on the recording function, and then you can access the target URL. Perform some operations on this page and stop recording after a while.

You can get a report, in which you can analyze memory-related information. After recording, there will be some charts displayed, and there is a lot of information, which seems to be quite complicated. Here we mainly focus on information related to memory, and there is a memory option (Memory). If it is not checked by default, you need to check it. A blue line can be seen on the page. This is the change that occurs in my memory during the whole process. I can see where the problem is based on the timing. If there is a problem somewhere, you can observe it specifically. For example, if there is an increase and decrease, then there is no problem.

1. Manifestation of memory problems

When there is a problem with the program's memory, what specific form will it take?

First of all, if the interface has delayed loading or frequent pauses, we must first ensure that the network environment is normal. In this case, we generally determine that there is a problem with the memory, and it is related to frequent garbage collection operations of GC. That is, there must be code in the code that causes the memory to explode instantly. Such code is not suitable for positioning.

The second is when the interface has persistently poor performance, that is, it is not particularly easy to use during use. In this case, the underlying layer generally believes that there is memory bloat. The so-called memory expansion means that in order to achieve the best usage speed, the current interface may apply for a certain amount of memory space, but the size of this memory space far exceeds the size that the current device itself can provide. At this time, you will perceive a continuous poor performance experience, and this is also assuming that the current network environment is normal.

Finally, when using some interfaces, if you feel that the smoothness of the interface is getting slower or worse as time goes by, this process is accompanied by memory leaks. In this case, there is no problem at the beginning. Due to the emergence of some of our codes, the memory space may become less and less as time goes by. This is the so-called memory leak. Therefore, when this happens, the interface will show worse and worse performance as the usage time increases.

This is about the situation when the application encounters memory problems during execution. The specific manifestation can be combined with Performance to perform memory analysis operations to locate the problematic code. After modification, the application will appear smoother during execution.

2. Several ways to monitor memory

Memory problems can generally be summarized into three types: memory leaks, memory expansion, and frequent garbage collection. When these contents appear, what standards should be used to define them?

A memory leak is actually a continuous increase in memory usage, which is easy to judge. Currently, there are many ways to obtain memory trend charts during application execution. If you find that the memory continues to increase without any drop in the entire process, this means that there is a memory leak in the program code. At this time, you should locate the corresponding module in the code.

Memory expansion is relatively vague. The original meaning of memory expansion refers to the application itself. In order to achieve the best effect, a large amount of memory space is required. In this process, perhaps due to the lack of hardware support of the current device itself, some performance differences may occur during use. To determine whether it is a program problem or a device problem, you should do more testing. At this time, you can find those devices that are popular with users and run the application on them. If all devices show a poor performance experience throughout the process. This means that there is a problem with the program itself, not the device. In this case, you need to go back into the code and locate where the memory problem occurs.

The specific ways to monitor memory changes mainly rely on tools provided by the browser.

The task manager provided by the browser can directly display the memory changes of the current application during its execution in a numerical way. The second is to use the Timeline timing diagram to directly present the trend of all memory during the execution of the application in the form of time points. With this diagram, it is easy to make judgments. In addition, there is a function called heap snapshot in the browser, which can specifically find out whether there are some separated DOMs in the interface objects, because the existence of separated DOMs is a kind of memory leak.

As for how to judge whether there is frequent garbage collection in the interface, it is necessary to use different tools to obtain the current memory trend chart, and then analyze it over a period of time to make a judgment.

3. Task Manager monitors memory

During the execution of a web application, if you want to observe the memory changes inside it, there are many ways to do it. Here is a simple demo to demonstrate that you can use the task manager that comes with the browser to monitor the memory changes when the script is running.

Place an element in the interface, add a click event, and create an array of very long length when the event is triggered. This will result in consumption of memory space.

<body>
    <button id="btn">add</button>
    <script>
        const oBtn = document.getElementById('btn');
        oBtn.onclick = function() {
            let arrList = new Array(1000000)
        }
    </script>
</body>

After completion, open the browser and run it. Find more tools in the upper right corner and find the task manager to open it.

At this time, you can locate the currently executing script in the Task Manager. By default, there is no JavaScript memory column. If necessary, you can right-click to find the JavaScript memory and display it. The two columns of most interest here are the Memory and JavaScript Memory columns.

The first column of memory represents the native memory, that is, there will be many DOM nodes in the current interface. This memory refers to the memory occupied by the DOM nodes. If this value continues to increase, it means that DOM elements are constantly being created in the interface.

JavaScript memory refers to the JavaScript heap. In this column, we need to pay attention to the value in the parentheses, which indicates the memory size being used by all reachable objects in the interface. If this value keeps increasing, it means that either new objects are being created in the current interface or existing objects are growing.

Taking this interface as an example, you can find that the value in the parentheses has always been a stable number and has not changed, which means that there is no memory growth on the current page. At this time, you can trigger the click event (click the button) again, click a few more times, and after completion, you will find that the value in the parentheses has become larger.

Through this process, you can use the current browser task manager to monitor the changes in the entire memory when the script is running. If the value in the parentheses of the current JavaScript memory column keeps increasing, it means that there is a problem with the memory. Of course, this tool cannot locate the problem. It can only detect the problem but cannot locate it.

4. TimeLine Recording Content

Previously, you could use the browser's built-in task manager to monitor memory changes during script execution, but during use, you can find that such operations are more used to determine whether there is a problem with the current script's memory. If you want to find out what script the problem is related to, the Task Manager is not that useful.

Here we introduce a method of recording memory changes through a timeline to demonstrate how to more accurately locate which piece of code the memory problem is related to, or at what time point it occurred.

First, place a DOM node, add a click event, create a large number of DOM nodes in the event to simulate memory consumption, and then use an array and other methods to form a very long string to simulate a large amount of memory consumption.

<body>
    <button id="btn">add</button>
    <script>
        const oBtn = document.getElementById('btn');

        const arrList = [];

        function test () {
            for (let i = 0; i < 100000; i++) {
                document.body.appendChild(document.createElement('p'))
            }
            arrList.push(new Array(1000000).join('x'))
        }
        oBtn.onclick = test;
    </script>
</body>

First open the browser's console tool and select the performance panel. It is not running by default, which means there is no record. You need to click the timing operation first. After clicking, the recording will start. Click the add button several times, wait a few seconds, and then click the stop button. After completion, a chart is generated. The densely packed information may be a bit confusing to look at, so just focus on the information you want to see.

If the memory is not checked, the memory changes will not be monitored. You need to check the memory first. After checking, the memory trend curve will appear on the page. It contains a lot of information and gives explanations for several colors. The blue one is the JavaScript heap, the red one is the current document, the green one is the DOM node, the brown one is the listener, and the purple one is the CPU memory.

For easier observation, you can keep only the JavaScript heap and uncheck the others to hide them. You can see the trend of the JavaScript heap so far during the execution of this script. The current tool is called a timing diagram, which is in the first column. It records the changes of the entire interface from blank to the end of rendering to the final stop state in milliseconds. If you want, you can click in to see the current interface. If you are only concerned about memory, you can just look at the memory curve graph.

When this page is first opened, it is actually in a stable state for a long time without much memory consumption. The reason is that add was not clicked at all. Then at a certain point in time, the memory suddenly went up, and then it was stable for a while. This is because after clicking add, the memory here must have skyrocketed instantly, and then we did not perform any operation after the skyrocketing, so it must be stable at this time.

Then it stabilized and then dropped again. This is what I mentioned before. The browser itself also has a garbage collection mechanism. When the script runs stably, GC may start working at a certain point in time and find that some objects are inactive, so it starts to recycle them. Therefore, it dropped after a period of stability. After the decline, there will be some small fluctuations, which are normal activity expenses. Later, there were several consecutive clicks, which may have caused the memory usage to soar, and then it decreased again after no operation.

Through such a memory trend chart, we can conclude that the memory in the script is very stable. There are increases and decreases in the whole process. The increase is the application of memory, and the decrease is that my GC is normally recycling memory after it is used up.

Once you see that the memory trend is going up in a straight line, it means that it is only growing but not recycled. There must be memory consumption, and it is more likely to be a memory leak. You can locate the problem through the above timing diagram. When you find a problem on a certain node, you can directly locate that time node here. You can drag on the timing diagram to view the memory consumption at each time node. You can also see the changes on the interface, which can help you locate which part of the memory problem has occurred.

Therefore, it is more useful than the task manager. It can not only check whether there is a problem with the current memory, but also help locate when the problem occurred. Then, with the help of the current interface display, you can know what kind of operation caused the problem, so that you can indirectly go back to the code to locate the problematic code block.

5. Heap snapshot to find detached DOM

Here is a brief explanation of how the heap snapshot feature works. First, it is equivalent to finding the JavaScript heap and then taking a photo of it. Once you have a photo, you can see all the information in it, which is the origin of surveillance. Heap snapshots are very useful when used because they are more like search behavior for detached DOM.

Many elements seen on the interface are actually DOM nodes, and these DOM nodes should exist in a living DOM tree. However, there are several forms of DOM nodes, one is a garbage object and the other is a detached DOM. Simply put, if this node is detached from the DOM tree and there is no DOM node referenced in the JavaScript code, it becomes garbage. If a DOM node is just detached from the DOM tree, but there is still a reference in the JavaScript code, it is detached DOM. The detached DOM is invisible in the interface, but it occupies space in memory.

This situation is a memory leak, which can be found through the heap snapshot function. As long as you can find it, you can go back to the code and clear it to free up some memory, and the script will become faster when executed.

Put a btn button in the HTML and add a click event. When the button is clicked, use JavaScript statements to simulate the corresponding memory changes, such as creating DOM nodes. In order to see more types of separated DOM, use ul to wrap li to create the DOM node. First create a ul node in the function, then use a loop to create multiple li nodes and put them in ul. After creation, they do not need to be placed on the page. In order for the code to reference this DOM, use the variable tmpEle to point to ul.

<body>
    <button id="btn">add</button>
    <script>
        const oBtn = document.getElementById('btn');

        var tmpEle;

        function fn () {
            var ul = document.createElement('ul');
            for (var i = 0; i < 10; i++) {
                var li = document.createElement('li');
                ul.appendChild(li);
            }
            tmpEle = ul;
        }

        oBtn.addEventListener('click', fn);

    </script>
</body>

A simple explanation is that the ul and li nodes are created, but they are not placed on the page. They are just referenced through JavaScript variables. This is the separation of DOM.

Open the browser debugging tool and select the Memory panel. After entering, you can find the option of heap snapshot. Two behavior tests are performed here. The first one is to directly obtain the current snapshot without clicking the button. In this snapshot is the specific display of the current object. There is a filtering operation here. Directly search for the deta keyword and you can find that there is no content.

I return to the interface and do another operation, click the button. After clicking, I take a snapshot (click the configuration file text on the left and the photo taking interface appears), and then do the same operation as before to retrieve deta.

This time you will find that the search is done in snapshot 2. Obviously, these are the DOM nodes created in the code. They are not added to the interface, but they do exist in the heap. This is actually a waste of space. To address this problem, you can just clear the used DOM nodes in the code.

function fn () {
    var ul = document.createElement('ul');
    for (var i = 0; i < 10; i++) {
        var li = document.createElement('li');
        ul.appendChild(li);
    }
    tmpEle = ul;
    // Clear the DOM
    ul = null;
}

Our brief summary here is that we can use a feature called heap snapshot provided by the browser to take a picture of our current heap. After taking the picture, we need to find out whether there is a so-called separated DOM.

Because the separated DOM is not reflected in the page, it does exist in the memory, so it is a waste of memory at this time. So what we need to do is to locate the locations of those separated DOMs in our code and then find a way to clear them.

6. Determine whether there is frequent GC

Here we will talk about how to determine whether there is frequent garbage collection during the execution of the current web application. The application is stopped while the GC goes to work. Therefore, frequent GC work is not friendly to web applications, because it will be in a dead state and users will feel lag.

At this time, we need to find a way to determine whether there is frequent garbage collection during the execution of the current application.

There are two methods here. The first is to judge by the trend of the timeline timing diagram and monitor the current memory trend in the performance tool panel. If you find that the blue trend bar rises and falls frequently. This means that garbage collection is performed frequently. When this happens, you must locate the corresponding time node, and then see what specific operations were performed to cause this phenomenon, and then handle it in the code.

The task manager will appear to be simpler when making judgments, because it is just a change in a numerical value. Normally, when the interface rendering is completed, if there are no other additional operations, then both the DOM node memory and our JavaScript memory are unchanged values, or the changes are very small. If there are frequent GC operations here, the change in this value is an instant increase and instant decrease, so seeing such a process also means that there are frequent garbage collection operations in the code.

The apparent impact of frequent garbage collection operations is that users feel that the application is very slow when in use. From an internal perspective, the current code contains improper memory operations that cause the GC to work continuously to reclaim and release the corresponding space.

Summarize

This is the end of this article about JavaScript garbage collection mechanism. For more relevant JavaScript garbage collection mechanism content, please search 123WORDPRESS.COM’s previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Learn JavaScript's garbage collection mechanism and memory management from me
  • Analysis of JavaScript garbage collection mechanism
  • Understanding of js garbage collection mechanism
  • JavaScript garbage collection mechanism and memory management
  • Talk about the garbage collection mechanism in JavaScript
  • Detailed explanation of js closure and garbage collection mechanism examples

<<:  CSS realizes that the left side of the webpage column is fixed and automatically adjusts the position when scrolling to the bottom

>>:  VMware Workstation Pro 16 Graphic Tutorial on Building CentOS8 Virtual Machine Cluster

Recommend

Detailed explanation of Jquery datagrid query

Table of contents Add code to the Tree item; 1. S...

The meaning of status code in HTTP protocol

A status code that indicates a provisional respon...

What are the differences between sql and mysql

What is SQL? SQL is a language used to operate da...

Docker solution for logging in without root privileges

When you use the docker command for the first tim...

Vue3 compilation process-source code analysis

Preface: Vue3 has been released for a long time. ...

Introduction to the use of the indeterminate property of the checkbox

When we use the folder properties dialog box in Wi...

Graphical steps of zabbix monitoring vmware exsi host

1. Enter the virtualization vcenter, log in with ...

Viewing and analyzing MySQL execution status

When you feel that there is a problem with MySQL ...

MySQL green version setting code and 1067 error details

MySQL green version setting code, and 1067 error ...

Kill a bunch of MySQL databases with just a shell script like this (recommended)

I was woken up by a phone call early in the morni...

Specific use of MySQL window functions

Table of contents 1. What is a window function? 1...

Node+socket realizes simple chat room function

This article shares the specific code of node+soc...