Vue3 AST parser-source code analysis

Vue3 AST parser-source code analysis

In the previous article Vue3 compilation process - source code analysis, we started from the entry of packges/vue/src/index.ts and learned about the compilation process of a Vue object. In the article, we mentioned that baseCompile function will generate an AST abstract syntax tree during execution. This is undoubtedly a critical step, because only by getting the generated AST can we traverse the nodes of the AST to perform transform operations, such as parsing various instructions such as v-if and v-for , or analyzing the nodes to statically promote the nodes that meet the conditions. These all rely on the previously generated AST abstract syntax tree. So today we will take a look at AST parsing and see how Vue parses templates.

1. Generate AST abstract syntax tree

First, let's review the logic of ast in baseCompile function and its subsequent use:

export function baseCompile(
  template: string | RootNode,
  options: CompilerOptions = {}
): CodegenResult {

  /* Ignore previous logic*/

  const ast = isString(template) ? baseParse(template, options) : template

  transform(
    ast,
    {/* Ignore parameters */}
  )

  return generate(
    ast,
    extend({}, options, {
      prefixIdentifiers
    })
  )
}

Because I have commented out the logic that we don't need to pay attention to, the logic in the function body will be very clear now:

  • Generate ast object
  • Pass ast object as a parameter to the transform function to transform ast node
  • Pass the ast object as a parameter to the generate function and return the compiled result

Here we mainly focus on the generation of ast. It can be seen that the generation of ast has a ternary operator judgment. If the template template parameter passed in is a string, baseParse is called to parse the template string, otherwise template is directly used as ast object. What is done in baseParse to generate ast? Let’s take a look at the source code.

export function baseParse(
  content: string,
  options: ParserOptions = {}
): RootNode {
  const context = createParserContext(content, options) // Create a parsing context object const start = getCursor(context) // Generate cursor information to record the parsing process return createRoot( // Generate and return the root node parseChildren(context, TextModes.DATA, []), // Parse child nodes as the children attribute of the root node getSelection(context, start)
  )
}

I added comments to the baseParse function to help you understand the role of each function. First, the parsing context is created, and then the cursor information is obtained based on the context. Since parsing has not yet been performed, column , line , and offset attributes in the cursor all correspond to the starting position of template . The next step is to create a root node and return the root node. At this point, the ast tree is generated and the parsing is completed.

2. Create the root node of AST

export function createRoot(
  children: TemplateChildNode[],
  loc = locStub
): RootNode {
  return {
    type: NodeTypes.ROOT,
    children,
    helpers: [],
    components: [],
    directives: [],
    hoists: [],
    imports: [],
    cached: 0,
    temps: 0,
    codegenNode: undefined,
    loc
  }
}

Looking at the code of the createRoot function, we can find that the function returns a root node object of type RootNode , in which the children parameter we passed in will be used as children parameter of the root node. This is very easy to understand. Just imagine it as a tree data structure. Therefore, the key point of generating ast will focus on the parseChildren function. If you don't look at the source code of the parseChildren function, you can roughly understand from the text that this is a function for parsing child nodes. Next, let's take a look at the most critical parseChildren function in AST parsing. As usual, I will simplify the logic in the function to help you understand.

3. Parsing child nodes

function parseChildren(
  context: ParserContext,
  mode: TextModes,
  ancestors: ElementNode[]
): TemplateChildNode[] {
  const parent = last(ancestors) // Get the parent node of the current node const ns = parent ? parent.ns : Namespaces.HTML
  const nodes: TemplateChildNode[] = [] // Store parsed nodes // When the label is not closed, parse the corresponding node while (!isEnd(context, mode, ancestors)) {/* Ignore logic*/}

  // Process whitespace characters to improve output efficiency let removedWhitespace = false
  if (mode !== TextModes.RAWTEXT && mode !== TextModes.RCDATA) {/* Ignore logic*/}

  // Remove whitespace characters and return the parsed node array return removedWhitespace ? nodes.filter(Boolean) : nodes
}

From the above code, we can know that parseChildren function receives three parameters, context : parser context, mode : text data type, ancestors : ancestor node array. When executing the function, the parent node of the current node will be obtained from the ancestor node first, the namespace will be determined, and an empty array will be created to store the parsed nodes. After that, there will be a while loop to determine whether the closing position of the tag has been reached. If it is not a tag that needs to be closed, the source template string will be classified and parsed in the loop body. After that, there will be a logic to process whitespace characters, and after processing, the parsed nodes array will be returned. Now that you have a preliminary understanding of the execution flow of parseChildren , let's take a look at the core of the function, the logic within the while loop.

In the while statement, the parser will determine the type of the text data and will continue parsing only when TextModes is DATA or RCDATA.

The first case is to determine whether the " Mustache " syntax (double braces) in the Vue template syntax needs to be parsed. If there is no v-pre instruction in the current context to skip the expression, and the source template string starts with the delimiter we specified (in this case, context.options.delimiters contains double braces), the double braces will be parsed. Here you can see that if you have special needs and do not want to use double braces as expression interpolation, you only need to change delimiters property in the options before compiling.

Next, if the first character is "<" and the second character is '!', it will try to parse the comment tag ,<!DOCTYPE and <!CDATA . In the three cases, DOCTYPE will be ignored and parsed as a comment.

Then it will determine that when the second character is "/", "</" has met the conditions of a closing tag, so it will try to match the closing tag. When the third character is ">", the tag name is missing, an error will be reported, and the parser will progress forward three characters, skipping "</>".

If it starts with "</" and the third character is a lowercase English character, the parser will parse the end tag.

If the first character of the source template string is "<" and the second character starts with a lowercase English character, parseElement function will be called to parse the corresponding tag.

When the branch condition for judging the string characters ends and no node is parsed, the node will be treated as a text type and parseText will be called for parsing.

Finally, add the generated node to the nodes array and return it at the end of the function.

This is the logic inside the while loop, and is the most important part of parseChildren . In this judgment process, we saw the parsing of the double curly brace syntax, how the comment node was parsed, the parsing of the start tag and the closing tag, and the parsing of the text content. The simplified code is in the box below. You can refer to the above explanation to understand the source code. Of course, the comments in the source code are also very detailed.

while (!isEnd(context, mode, ancestors)) {
  const s = context.source
  let node: TemplateChildNode | TemplateChildNode[] | undefined = undefined

  if (mode === TextModes.DATA || mode === TextModes.RCDATA) {
    if (!context.inVPre && startsWith(s, context.options.delimiters[0])) {
      /* If the tag does not have a v-pre directive, the source template string starts with double curly braces `{{` and is parsed according to the double curly brace syntax*/
      node = parseInterpolation(context, mode)
    } else if (mode === TextModes.DATA && s[0] === '<') {
      // If the first character position of the source template string is `!`
      if (s[1] === '!') {
    // If it starts with '<!--', parse it as a comment if (startsWith(s, '<!--')) {
          node = parseComment(context)
        } else if (startsWith(s, '<!DOCTYPE')) {
     // If it starts with '<!DOCTYPE', ignore DOCTYPE and parse it as a pseudo-comment node = parseBogusComment(context)
        } else if (startsWith(s, '<![CDATA[')) {
          // If it starts with '<![CDATA[' and is in an HTML environment, parse CDATA
          if (ns !== Namespaces.HTML) {
            node = parseCDATA(context, ancestors)
          }
        }
      // If the second character position of the source template string is '/'
      } else if (s[1] === '/') {
        // If the third character position of the source template string is '>', then it is a self-closing tag, and the scanning position moves forward three characters if (s[2] === '>') {
          emitError(context, ErrorCodes.MISSING_END_TAG_NAME, 2)
          advanceBy(context, 3)
          continue
        // If the third character position is an English character, parse the end tag} else if (/[az]/i.test(s[2])) {
          parseTag(context, TagType.End, parent)
          continue
        } else {
          // If it is not the case above, parse it as a pseudo comment node = parseBogusComment(context)
        }
      // If the second character of the tag is a lowercase English character, it is parsed as an element tag} else if (/[az]/i.test(s[1])) {
        node = parseElement(context, ancestors)
        
      // If the second character is '?', interpret it as a pseudo comment} else if (s[1] === '?') {
        node = parseBogusComment(context)
      } else {
        // If none of these conditions are met, an error message will be given indicating that the first character is not a legal label character.
        emitError(context, ErrorCodes.INVALID_FIRST_CHARACTER_OF_TAG_NAME, 1)
      }
    }
  }
  
  // If no corresponding node is created after the above situation is parsed, parse it as text if (!node) {
    node = parseText(context, mode)
  }
  
  // If the node is an array, traverse and add it to the nodes array, otherwise add it directly if (isArray(node)) {
    for (let i = 0; i < node.length; i++) {
      pushNode(nodes, node[i])
    }
  } else {
    pushNode(nodes, node)
  }
}

4. Parsing template elements

In the while loop, in each branch judgment branch, we can see node will receive the return value of the parsing function of various node types. Here I will talk about the parseElement function in detail, because this is the most frequently used scenario in templates.

I'll first simplify the source code of parseElement and paste it here, and then talk about the logic inside.

function parseElement(
  context: ParserContext,
  ancestors: ElementNode[]
): ElementNode | undefined {
  // Parse the start tag const parent = last(ancestors)
  const element = parseTag(context, TagType.Start, parent)
  
  // If it is a self-closing tag or an empty tag, return directly. voidTagExample: `<img>`, `<br>`, `<hr>`
  if (element.isSelfClosing || context.options.isVoidTag(element.tag)) {
    return element
  }

  // Recursively parse child nodes ancestors.push(element)
  const mode = context.options.getTextMode(element, parent)
  const children = parseChildren(context, mode, ancestors)
  ancestors.pop()

  element.children = children

  // Parse the end tag if (startsWithEndTagOpen(context.source, element.tag)) {
    parseTag(context, TagType.End, parent)
  } else {
    emitError(context, ErrorCodes.X_MISSING_END_TAG, 0, element.loc.start)
    if (context.source.length === 0 && element.tag.toLowerCase() === 'script') {
      const first = children[0]
      if (first && startsWith(first.loc.source, '<!--')) {
        emitError(context, ErrorCodes.EOF_IN_SCRIPT_HTML_COMMENT_LIKE_TEXT)
      }
    }
  }
  // Get the label location object element.loc = getSelection(context, element.loc.start)

  return element
}

First we get the parent node of the current node, and then call parseTag function to parse it.

The parseTag function will be executed according to the following process:

  • First match the tag name.
  • Parse the attribute attributes in the element and store them in the props attribute
  • Check if there is a v-pre instruction. If so, modify the inVPre attribute in the context to true.
  • Detect self-closing tags. If they are self-closing, set the isSelfClosing property to true.
  • Determine the tagType, whether it is an ELEMENT element, a COMPONENT component, or a SLOT slot
  • Returns the generated element object

After obtaining element object, it will determine whether element is a self-closing tag or an empty tag, such as <img>, <br>, <hr>. If this is the case, element object is returned directly.

Then we will try to parse the child nodes of element , push element into the stack, and then recursively call parseChildren to parse the child nodes.

const parent = last(ancestors)

Looking back at the lines of code in parseChildren and parseElement , we can find that after pushing element into the stack, the parent node we get is the current node. After parsing is completed, call ancestors.pop() to pop element object whose child node is currently parsed, assign the parsed children object to children attribute of element , and complete the child node parsing of element . This is a very clever design.

Finally, match the end tag, set the loc position information of the element, and return the parsed element object.

5. Example: Template element parsing

Please see the template we are going to parse below. The picture shows the storage of the stack of nodes after parsing during the parsing process.

<div>
  <p>Hello World</p>
</div>

The yellow rectangle in the figure is a stack. When parsing begins, parseChildren first encounters the div tag and starts calling the parseElement function. The div element is parsed through the parseTag function, pushed into the stack, and the child nodes are parsed recursively. The second time the parseChildren function is called, the p element is encountered, and the parseElement function is called to push the p tag into the stack. At this time, there are two tags, div and p, in the stack. Parse the child nodes in p again and call the parseChildren tag for the third time. This time, no tag will be matched and no corresponding node will be generated. Therefore, the parseText function will be used to generate text, parse the node as HelloWorld , and return the node.

After adding this text type node to the children attribute of the p tag, the child nodes of the p tag are parsed, the ancestor stack is popped, and after the end tag is parsed, element object corresponding to the p tag is returned.

The node corresponding to the p tag is generated and the corresponding node is returned in parseChildren function.

After receiving the node from the p tag, the div tag adds it to its own children attribute and pops it from the stack. At this time, the ancestral stack is empty. After the div tag completes the closed parsing logic, it returns the element element.

Finally, the first call of parseChildren returns the result, generates the node object corresponding to the div, and also returns the result. This result is passed as the children parameter of createRoot function to generate the root node object and complete the ast parsing.

This is the end of this article about Vue3 AST parser source code parsing. For more related Vue3 AST parser content, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Vue3 compilation process-source code analysis
  • Details of 7 kinds of component communication in Vue3
  • Detailed explanation of Vue3 encapsulation Message message prompt instance function
  • The difference and usage of Vue2 and Vue3 brother component communication bus
  • Using vue3 to implement counting function component encapsulation example
  • Vue3.0 implements the encapsulation of the drop-down menu
  • Vue3.0 implements encapsulation of checkbox components
  • Comparison of the advantages of vue3 and vue2
  • Practical record of Vue3 combined with TypeScript project development
  • Summary of Vue3 combined with TypeScript project development practice

<<:  How to update, package, and upload Docker containers to Alibaba Cloud

>>:  Simply understand the differences in the principles of common SQL delete statements

Recommend

Mysql implements null value first/last method example

Preface We already know that MySQL uses the SQL S...

Two simple menu navigation bar examples

Menu bar example 1: Copy code The code is as foll...

MySQL multi-table join introductory tutorial

Connections can be used to query, update, and est...

Detailed explanation of group by and having in MySQL

The GROUP BY syntax can group and count the query...

Detailed explanation of three ways to set borders in HTML

Three ways to set borders in HTML border-width: 1...

Solution to the Multiple primary key defined error in MySQL

There are two ways to create a primary key: creat...

How to use axios to filter multiple repeated requests in a project

Table of contents 1. Introduction: In this case, ...

Practical notes on installing Jenkins with docker-compose

Create a Directory cd /usr/local/docker/ mkdir je...

JavaScript BOM location object + navigator object + history object

Table of contents 1. Location Object 1. URL 2. Pr...

Grid systems in web design

Formation of the grid system In 1692, the newly c...

Installation and configuration method of vue-route routing management

introduce Vue Router is the official routing mana...

Vue implements multi-tab component

To see the effect directly, a right-click menu ha...