Analyze Tomcat architecture principles to architecture design

Analyze Tomcat architecture principles to architecture design

1. Learning Objectives

1.1. Master Tomcat architecture design and principles to improve internal skills

Macro View

As an " Http server + Servlet container", Tomcat shields us from the application layer protocol and network communication details, and provides us with standard Request and Response objects; the specific business logic is used as a variation point and is left to us to implement. We use frameworks such as SpringMVC , but we never need to consider TCP connections, Http protocol data processing and responses. Because Tomcat has done all this for us, we only need to focus on the specific business logic of each request.

Microscopic view

Tomcat also isolates the changing points and the unchanged points internally and uses a componentized design in order to achieve a "Russian doll-style" high degree of customization (combination pattern). The life cycle management of each component has some common features, which are extracted into interfaces and abstract classes to allow specific subclasses to implement the changing points, which is the template method design pattern.

The popular microservices today also follow this idea, splitting the monolithic application into "microservices" according to their functions. The commonalities are extracted during the splitting process, and these commonalities will become the core basic services or general libraries. The same is true of the "middle platform" idea.

Design patterns are often a powerful tool for encapsulating changes. Reasonable use of design patterns can make our code and system design elegant and neat.

This is the "internal strength" that can be gained by learning excellent open source software. It will never become outdated, and the design ideas and philosophy contained therein are the fundamental way. Learn from their design experience, reasonably use design patterns to encapsulate changes and constants, and draw experience from their source code to improve your own system design capabilities.

1.2. Macro understanding of how a request is connected to Spring

In the course of work, we are already very familiar with Java syntax, and have even "memorized" some design patterns and used many Web frameworks, but we rarely have the opportunity to use them in actual projects. Designing a system independently seems to be just implementing one Service at a time according to the needs. I don't seem to have a panoramic view of Java Web development in my mind. For example, I don't know how the browser request is connected to the code in Spring.

In order to break through this bottleneck, why not stand on the shoulders of giants to learn excellent open source systems and see how the big guys think about these problems.

After studying the principles of Tomcat, I found that Servlet technology is the origin of Web development. Almost all Java Web frameworks (such as Spring) are based on Servlet encapsulation. The Spring application itself is a Servlet ( DispatchSevlet ), and Web containers such as Tomcat and Jetty are responsible for loading and running Servlet . As shown in the figure:

1.3. Improve your system design capabilities

When learning Tomcat, I also found that I used a lot of advanced Java technologies, such as Java multi-threaded concurrent programming, Socket network programming, and reflection. Before, I only knew about these technologies and memorized some questions for interviews. But I always feel that there is a gap between "knowing" and being able to use it. By studying the Tomcat source code, I learned what scenarios to use these technologies.

There is also system design capability, such as interface-oriented programming, component-based combination mode, skeleton abstract class, one-click start and stop, object pool technology and various design patterns, such as template method, observer mode, chain of responsibility mode, etc. Later, I began to imitate them and apply these design ideas to actual work.

2. Overall Architecture Design

Today we will analyze the design ideas of Tomcat step by step. On the one hand, we can learn the overall architecture of Tomcat, learn how to design a complex system from a macro perspective, how to design top-level modules, and the relationship between modules; on the other hand, it also lays the foundation for our in-depth study of the working principles of Tomcat.

Tomcat startup process:

startup.sh -> catalina.sh start -> java -jar org.apache.catalina.startup.Bootstrap.main()

Tomcat implements two core functions:

  • Processes Socket connections and is responsible for the conversion of network byte streams into Request and Response objects.
  • Load and manage Servlet , and process specific Request requests.

Therefore, Tomcat is designed with two core components: Connector and Container. The connector is responsible for external communication, and the container is responsible for internal processing

In order to support multiple I/O models and application layer protocols, a container Tomcat may be connected to multiple connectors, just like a room has multiple doors.

  • Server corresponds to a Tomcat instance.
  • By default, there is only one Service, that is, there is only one Service for one Tomcat instance.
  • Connector: A Service may have multiple connectors, accepting different connection protocols.
  • Container: Multiple connectors correspond to one container, and the top-level container is actually the Engine.

Each component has a corresponding life cycle and needs to be started, and its internal subcomponents must also be started. For example, a Tomcat instance contains a Service, and a Service contains multiple connectors and a container. A container contains multiple Hosts, and there may be multiple Contex t containers inside the Host, and a Context may also contain multiple Servlets, so Tomcat uses the composite mode to manage each component and treats each component as a single group. Overall, the design of each component is like a "Russian doll".

2.1 Connectors

Before I start talking about connectors, let me first lay the groundwork for the various I/O models and application layer protocols supported by Tomcat .

The I/O models supported by Tomcat are:

  • NIO : Non-blocking I/O , implemented using the Java NIO class library.
  • NIO2 : Asynchronous I/O , implemented using the latest NIO2 class library JDK 7 .
  • APR : It is implemented using the Apache Portable Runtime and is a native library written in C/C++ .

The application layer protocols supported by Tomcat are:

  • HTTP/1.1 : This is the access protocol used by most Web applications.
  • AJP : Used for integration with web servers (such as Apache).
  • HTTP/2 : HTTP 2.0 significantly improves Web performance.

So one container may dock with multiple connectors. The connector shields the Servlet container from the differences in network protocols and I/O models. Regardless of whether it is Http or AJP , what is obtained in the container is a standard ServletRequest object.

The functional requirements of the refined connector are:

  • Listening network port.
  • Accept the network connection request.
  • Read the request network byte stream.
  • Parse the byte stream according to the specific application layer protocol ( HTTP/AJP ) and generate a unified Tomcat Request object.
  • Convert the Tomcat Request object into a standard ServletRequest .
  • Call the Servlet container and get ServletResponse .
  • Convert ServletResponse to Tomcat Response object.
  • Convert Tomcat Response to network byte stream. Writes the response byte stream back to the browser.

After the requirements are clearly listed, the next question we need to consider is, what sub-modules should the connector have? Excellent modular design should consider high cohesion and low coupling.

  • High cohesion means that functions with high relevance should be concentrated as much as possible and not dispersed.
  • Low coupling means that two related modules should reduce the dependencies and the degree of dependencies as much as possible, and avoid strong dependencies between the two modules.

We found that connectors need to complete three highly cohesive functions:

  • Network communication.
  • Application layer protocol analysis.
  • Conversion between Tomcat Request/Response and ServletRequest/ServletResponse .

Therefore, the designers of Tomcat designed three components to implement these three functions, namely EndPoint、Processor 和Adapter .

The I/O model of network communication is changing, and the application layer protocol is also changing, but the overall processing logic remains unchanged. EndPoint is responsible for providing byte streams to Processor , Processor is responsible for providing Tomcat Request objects to Adapter , and Adapter is responsible for providing ServletRequest objects to the container.

2.2 Encapsulation changes and invariance

Therefore, Tomcat designed a series of abstract base classes to encapsulate these stable parts. The abstract base class AbstractProtocol implements the ProtocolHandler interface. Each application layer protocol has its own abstract base class, such as AbstractAjpProtocol and AbstractHttp11Protocol , and the implementation class of the specific protocol extends the protocol layer abstract base class.

This is the application of Template Method design pattern.

In summary, the three core components of the connector, Endpoint , Processor , and Adapter , do three things respectively. Endpoint and Processor are abstracted together into the ProtocolHandler component. Their relationship is shown in the following figure.

ProtocolHandler component:

It mainly handles network connections and application layer protocols. It includes two important components, EndPoint and Processor. The two components are combined to form ProtocoHandler. Let me introduce their working principles in detail.

EndPoint:

EndPoint is the communication endpoint, that is, the interface for communication monitoring. It is a specific Socket receiving and sending processor and an abstraction of the transport layer. Therefore, EndPoint is used to implement TCP/IP protocol data reading and writing, and essentially calls the socket interface of the operating system.

EndPoint is an interface, and the corresponding abstract implementation class is AbstractEndpoint . The concrete subclasses of AbstractEndpoint , such as NioEndpoint and Nio2Endpoint , have two important subcomponents: Acceptor and SocketProcessor .

The Acceptor is used to monitor the Socket connection request. SocketProcessor is used to process Socket request received Acceptor . It implements the Runnable interface and calls the application layer protocol processing component Processor in Run method for processing. To improve processing power, SocketProcessor is submitted to the thread pool for execution.

We know that the use of Java multiplexers is nothing more than two steps:

  • Create a Seletor, register various events of interest to it, and then call the select method to wait for things of interest to happen.
  • When something interesting happens, such as data being available for reading, a new thread is created to read data from the Channel.

In Tomcat, NioEndpoint is the specific implementation of AbstractEndpoint . Although there are many components in it, the processing logic is still the first two steps. It contains five components: LimitLatch , Acceptor , Poller , SocketProcessor and Executor , which work together to implement the processing of the entire TCP/IP protocol.

LimitLatch is a connection controller that controls the maximum number of connections. The default value in NIO mode is 10,000. When this threshold is reached, the connection request is rejected.

Acceptor runs in a separate thread. It calls the accept method in an infinite loop to receive new connections. Once a new connection request arrives, accept method returns a Channel object, which is then handed over to Channel Poller for processing.

The essence of Poller is a Selector , which also runs in a separate thread. Poller maintains a Channel array internally. It continuously detects the data readiness status of Channel in an infinite loop. Once Channel is readable, it generates a SocketProcessor task object and sends it to Executor for processing.

SocketProcessor implements the Runnable interface, in which getHandler().process(socketWrapper, SocketEvent.CONNECT_FAIL); code in the run method gets the handler and executes the processing of socketWrapper, and finally gets the appropriate application layer protocol processor through the socket, that is, calls the Http11Processor component to process the request. Http11Processor reads the data of Channel to generate ServletRequest object. Http11Processor does not read Channel directly. This is because Tomcat supports synchronous non-blocking I/O model and asynchronous I/O model. In Java API, the corresponding Channel classes are also different, such as AsynchronousSocketChannel and SocketChannel. In order to shield these differences from Http11Processor, Tomcat designed a wrapper class called SocketWrapper. Http11Processor only calls SocketWrapper methods to read and write data.

Executor is a thread pool responsible for running SocketProcessor task class. The run method of SocketProcessor calls Http11Processor to read and parse the request data. We know that Http11Processor is an encapsulation of the application layer protocol. It calls the container to obtain the response and then writes the response through Channel .

The workflow is as follows:

Processor:

Processor is used to implement HTTP protocol. Processor receives Socket from EndPoint, reads byte stream and parses it into Tomcat Request and Response objects, and submits it to container for processing through Adapter. Processor is an abstraction of application layer protocol.

From the figure, we can see that after EndPoint receives the Socket connection, it generates a SocketProcessor task and submits it to the thread pool for processing. The Run method of SocketProcessor calls the HttpProcessor component to parse the application layer protocol. After the Processor generates the Request object through parsing, it calls the Service method of the Adapter. The method passes the request to the container through the following code.

// Calling the container
connector.getService().getContainer().getPipeline().getFirst().invoke(request, response);

Adapter component:

Due to different protocols, Tomcat defines its own Request class to store request information, which actually reflects object-oriented thinking. However, this Request is not a standard ServletRequest , so you cannot directly use Tomcat to define the Request as a parameter directly in the container.

The solution of Tomcat designers is to introduce CoyoteAdapter , which is a classic application of the adapter pattern. The connector calls the Sevice method of CoyoteAdapter and passes in the Tomcat Request object. CoyoteAdapter is responsible for converting Tomcat Request into ServletRequest and then calling the container's Service method.

2.3 Container

The connector is responsible for external communication, and the container is responsible for internal processing. Specifically, the connector handles the analysis of Socket communication and application layer protocol to obtain Servlet request; while the container is responsible for processing Servlet request.

Container: As the name implies, it is used to hold things, so the Tomcat container is used to load Servlet .

Tomcat has designed four containers: Engine , Host , Context and Wrapper . Server represents the Tomcat instance.

It should be noted that these four containers are not in a parallel relationship, but in a parent-child relationship, as shown in the following figure:

You may ask, why do we need to design so many levels of containers? Doesn’t this increase complexity? In fact, the consideration behind this is that Tomcat uses a layered architecture to make the Servlet container very flexible. Because here it happens that one Host has multiple Contexts, and one Context also contains multiple Servlets, and each component requires unified life cycle management, so the combined mode designs these containers

Wrapper represents a Servlet , Context represents a Web application, and a Web application may have multiple Servlet ; Host represents a virtual host, or a site, a Tomcat can be configured with multiple sites (Host); a site (Host) can deploy multiple Web applications; Engine represents the engine, which is used to manage multiple sites (Host), and a Service can only have one Engine .

You can use the Tomcat configuration file to gain a deeper understanding of its hierarchical relationship.

<Server port="8005" shutdown="SHUTDOWN"> // Top-level component, can contain multiple Services, represents a Tomcat instance<Service name="Catalina"> // Top-level component, contains an Engine, multiple connectors<Connector port="8080" protocol="HTTP/1.1"
               connectionTimeout="20000"
               redirectPort="8443" />

    <!-- Define an AJP 1.3 Connector on port 8009 -->
    <Connector port="8009" protocol="AJP/1.3" redirectPort="8443" /> // Connector // Container component: an Engine handles all Service requests, including multiple Hosts
    <Engine name="Catalina" defaultHost="localhost">
	  //Container component: processes client requests under the specified Host, which can contain multiple Contexts
      <Host name="localhost" appBase="webapps"
            unpackWARs="true" autoDeploy="true">
			//Container component: handles all client requests for a specific Context Web application <Context></Context>
      </Host>
    </Engine>
  </Service>
</Server>

How to manage these containers? We found that there is a parent-child relationship between containers, forming a tree structure. Is it possible to think of the combination pattern in the design pattern?

Tomcat uses the combined mode to manage these containers. The specific implementation method is that all container components implement the Container interface, so the composite mode allows users to use single container objects and composite container objects consistently. Here, the single container object refers to the bottom-level Wrapper , and the composite container object refers to the above Context , Host or Engine . Container interface is defined as follows:

public interface Container extends Lifecycle {
    public void setName(String name);
    public Container getParent();
    public void setParent(Container container);
    public void addChild(Container child);
    public void removeChild(Container child);
    public Container findChild(String name);
}

We have seen methods such as getParent , SetParent , addChild , and removeChild , which just verify the combination pattern we mentioned. We also see that Container interface extends Lifecycle . Tomcat manages the lifecycle of all container components in a unified manner through Lifecycle . All containers are managed through the combined mode, and Lifecycle is expanded to implement the life cycle management of each component. Lifecycle mainly includes the methods init()、start()、stop() 和destroy() .

2.4. The process of requesting to locate the Servlet

How is a request located to which Wrapper Servlet to handle it? The answer is that Tomcat uses the Mapper component to accomplish this task.

The function of Mapper component is to locate URL requested by the user to a Servlet . Its working principle is: Mapper component stores the configuration information of the Web application, which is actually the mapping relationship between the container component and the access path, such as the domain name configured in the Host container, Web application path in Context container, and the Servlet mapping path in the Wrapper container. You can imagine that these configuration information is a multi-level Map .

When a request comes in, Mapper component can locate a Servlet by parsing the domain name and path in the request URL and then searching in the Map it has saved. Please note that a request URL will ultimately only locate a Wrapper container, that is, a Servlet .

If a user visits a URL, such as http://user.shopping.com:8080/order/buy in the figure, how does Tomcat locate this URL to a Servlet?

1. First, determine the Service and Engine based on the protocol and port number. Tomcat's default HTTP connector listens on port 8080, and the default AJP connector listens on port 8009. The URL in the above example accesses port 8080, so the request will be received by the HTTP connector, and a connector belongs to a Service component, so the Service component is determined. We also know that in addition to multiple connectors, a Service component also has a container component, specifically an Engine container, so once the Service is determined, the Engine is also determined.

2. Select Host based on domain name. After the Service and Engine are determined, the Mapper component searches for the corresponding Host container through the domain name in the URL. For example, the domain name accessed by the URL in the example is user.shopping.com , so the Mapper will find the Host2 container.

3. Find the Context component based on the URL path. After the Host is determined, the Mapper matches the path of the corresponding Web application according to the URL path. For example, in this example, the path accessed is /order, so the Context container Context4 is found.

4. Find the Wrapper (Servlet) based on the URL path. After the Context is determined, the Mapper finds the specific Wrapper and Servlet according to the Servlet mapping path configured in web.xml.

The Adapter in the connector will call the container's Service method to execute the Servlet. The first container to receive the request is the Engine container. After the Engine container processes the request, it will pass the request to its child container Host for further processing, and so on. Finally, the request will be passed to the Wrapper container, and the Wrapper will call the final Servlet for processing. So how is this calling process implemented? The answer is to use the Pipeline-Valve pipeline.

Pipeline-Valve is a chain of responsibility model. The chain of responsibility model means that in the process of a request processing, there are many processors that process the request in turn. Each processor is responsible for its own corresponding processing. After processing, it will call the next processor to continue processing. Valve represents a processing point (that is, a processing valve), so the invoke method is used to process the request.

public interface Valve {
  public Valve getNext();
  public void setNext(Valve valve);
  public void invoke(Request request, Response response)
}

Continue to look at the Pipeline interface

public interface Pipeline {
  public void addValve(Valve valve);
  public Valve getBasic();
  public void setBasic(Valve valve);
  public Valve getFirst();
}

There is addValve method in Pipeline . A Valve list is maintained in the Pipeline. Valve can be inserted into Pipeline to perform certain processing on the request. We also found that there is no invoke method in Pipeline, because the triggering of the entire call chain is completed by Valve. After Valve completes its own processing, it calls getNext.invoke() to trigger the next Valve call.

In fact, each container has a Pipeline object. As long as the first Valve of this Pipeline is triggered, all the Valves in Pipeline of this container will be called. However, how are the Pipelines of different containers triggered in a chain? For example, the Pipeline in the Engine needs to call the Pipeline in the lower-level container Host.

This is because there is also a getBasic method in Pipeline . This BasicValve is at the end of the Valve linked list. It is an essential Valve in Pipeline and is responsible for calling the first Valve in the Pipeline of the lower container.

The whole process is triggered by CoyoteAdapter in the connector, which calls the first Valve of the Engine:

@Override
public void service(org.apache.coyote.Request req, org.apache.coyote.Response res) {
    // Omit other code // Calling the container
    connector.getService().getContainer().getPipeline().getFirst().invoke(
        request, response);
    ...
}

The last Valve of the Wrapper container will create a Filter chain and call doFilter() method, which will eventually be transferred to Servlet 's service method.

Didn’t we talk about Filter before? It seems to have similar functions. So what is the difference between Valve and Filter ? The differences between them are:

  • Valve is Tomcat 's private mechanism and is tightly coupled with Tomcat's infrastructure API . Servlet API is a public standard, and all web containers including Jetty support the Filter mechanism.
  • Another important difference is that Valve works at the Web container level and intercepts requests from all applications, while Servlet Filter works at the application level and can only intercept all requests from a certain Web application. If you want to be an interceptor for the entire Web container, you must do it through Valve .

Lifecycle

Earlier we saw that Container container inherits Lifecycle lifecycle. If we want a system to provide services to the outside world, we need to create, assemble and start these components; when the service stops, we also need to release resources and destroy these components, so this is a dynamic process. That is, Tomcat needs to dynamically manage the life cycle of these components.

How to uniformly manage the creation, initialization, start, stop and destruction of components? How to make the code logic clear? How to add or remove components easily? How to ensure that components are started and stopped without omission or duplication?

One-touch start and stop: LifeCycle interface

Design is about finding the changing and unchanging points of the system. The invariant point here is that each component must go through the processes of creation, initialization, and startup, and these states and state transformations remain unchanged. The change is that the initialization method of each specific component, that is, the startup method is different.

Therefore, Tomcat abstracts the invariant points into an interface, which is related to the life cycle and is called LifeCycle. The LifeCycle interface defines several methods: init()、start()、stop() 和destroy() , and each specific component (that is, container) implements these methods.

In init() method of the parent component, you need to create the child component and call init() method of the child component. Similarly, start() method of the child component also needs to be called in start() method of the parent component, so the caller can call init() method and start() method of each component indiscriminately. This is the use of the composite mode, and as long as init() and start() methods of the top-level component, that is, the Server component, are called, the entire Tomcat will be started. Therefore, Tomcat adopts a combined mode to manage containers. The containers inherit the LifeCycle interface, so that the life cycle of each container can be managed with one click just like a single object, and the entire Tomcat is started.

Scalability: LifeCycle Events

Let's consider another issue, which is the scalability of the system. Because the specific implementation of init() and start() methods of each component is complex and changeable. For example, in the startup method of the Host container, it is necessary to scan the Web applications in the webapps directory and create the corresponding Context container. If new logic needs to be added in the future, can start() method be modified directly? This will violate the open-closed principle, so how to solve this problem? The open-closed principle states that in order to expand the functionality of a system, you cannot directly modify existing classes in the system, but you can define new classes.

init() and start() calls of a component are triggered by the state change of its parent component. The initialization of the upper-level component will trigger the initialization of the child component, and the startup of the upper-level component will trigger the startup of the child component. Therefore, we define the life cycle of a component as a state and regard the state transition as an event. Events have listeners, in which some logic can be implemented, and listeners can also be easily added and deleted. This is a typical observer pattern.

The following is the definition of Lyfecycle interface:

Reusability: LifeCycleBase abstract base class

See the Abstract Template design pattern again.

With the interface, we need to use classes to implement the interface. Generally speaking, there is more than one implementation class, and different classes often have some of the same logic when implementing the interface. If each subclass is required to implement it, there will be duplicate code. How can subclasses reuse this logic? In fact, it is to define a base class to implement common logic, and then let each subclass inherit it to achieve the purpose of reuse.

Tomcat defines a base class LifeCycleBase to implement the LifeCycle interface, and puts some common logic into the base class, such as the transition and maintenance of life states, the triggering of life events, and the addition and deletion of listeners, etc., while the subclass is responsible for implementing its own initialization, start and stop methods.

public abstract class LifecycleBase implements Lifecycle{
    //Hold all observers private final List<LifecycleListener> lifecycleListeners = new CopyOnWriteArrayList<>();
    /**
     * Publish Event *
     * @param type Event type
     * @param data Data associated with event.
     */
    protected void fireLifecycleEvent(String type, Object data) {
        LifecycleEvent event = new LifecycleEvent(this, type, data);
        for (LifecycleListener listener : lifecycleListeners) {
            listener.lifecycleEvent(event);
        }
    }
    // Template method defines the entire startup process and starts all containers @Override
    public final synchronized void init() throws LifecycleException {
        //1. Status check if (!state.equals(LifecycleState.NEW)) {
            invalidTransition(Lifecycle.BEFORE_INIT_EVENT);
        }

        try {
            //2. Trigger the listener for the INITIALIZING event setStateInternal(LifecycleState.INITIALIZING, null, false);
            // 3. Call the initialization method initInternal() of the specific subclass;
            // 4. Trigger the listener for the INITIALIZED event setStateInternal(LifecycleState.INITIALIZED, null, false);
        } catch (Throwable t) {
            ExceptionUtils.handleThrowable(t);
            setStateInternal(LifecycleState.FAILED, null, false);
            throw new LifecycleException(
                    sm.getString("lifecycleBase.initFail",toString()), t);
        }
    }
}

In order to achieve one-click start and stop and elegant life cycle management, Tomcat takes into account scalability and reusability, and takes object-oriented thinking and design patterns to the extreme. Containaer interface maintains the parent-child relationship of the container. Lifecycle combination pattern implements the life cycle maintenance of the component. Each component in the life cycle has changing and unchanging points, and the template method pattern is used. The composite pattern, observer pattern, skeleton abstract class and template method were used respectively.

If you need to maintain a bunch of entities with parent-child relationships, consider using the Composite pattern.

The observer pattern sounds "high-end", but in fact it means that when an event occurs, a series of update operations need to be performed. A low-coupling, non-intrusive notification and update mechanism is implemented.

Container inherits LifeCycle. StandardEngine, StandardHost, StandardContext and StandardWrapper are the specific implementation classes of the corresponding container components. Because they are all containers, they inherit the ContainerBase abstract base class. ContainerBase implements the Container interface and also inherits the LifeCycleBase class. Their lifecycle management interface and functional interface are separate, which is also in line with the principle of interface separation in design.

3. Why Tomcat breaks the parent delegation mechanism

3.1. Parent Delegation

We know that JVM class loader loads Class based on the parent delegation mechanism, that is, it will hand over the loading to its own parent loader. If the parent loader is empty, it will check whether Bootstrap has been loaded. If it cannot be loaded, it will load it itself. JDK provides an abstract class ClassLoader , which defines three key methods. External use of loadClass(String name) 用于子類重寫打破雙親委派:loadClass(String name, boolean resolve)

public Class<?> loadClass(String name) throws ClassNotFoundException {
    return loadClass(name, false);
}
protected Class<?> loadClass(String name, boolean resolve)
    throws ClassNotFoundException
{
    synchronized (getClassLoadingLock(name)) {
        // Find out if the class has been loaded Class<?> c = findLoadedClass(name);
        // If not loaded if (c == null) {
            // Delegate to the parent loader to load, recursively call if (parent != null) {
                c = parent.loadClass(name, false);
            } else {
                // If the parent loader is empty, find out whether Bootstrap has been loaded c = findBootstrapClassOrNull(name);
            }
            // If it still cannot be loaded, call your own findClass to load if (c == null) {
                c = findClass(name);
            }
        }
        if (resolve) {
            resolveClass(c);
        }
        return c;
    }
}
protected Class<?> findClass(String name){
    //1. According to the passed class name, search for the class file in a specific directory and read the .class file into memory...

        //2. Call defineClass to convert the byte array into a Class object return defineClass(buf, off, len);
}

// Parse the bytecode array into a Class object and implement it with native methods protected final Class<?> defineClass(byte[] b, int off, int len){
    ...
}

There are 3 class loaders in JDK, and you can also customize class loaders. Their relationship is shown in the figure below.

  • BootstrapClassLoader is a startup class loader implemented in C language. It is used to load the core classes required when JVM starts, such as rt.jar , resources.jar , etc.
  • ExtClassLoader is an extended class loader used to load JAR packages in the \jre\lib\ext directory.
  • AppClassLoader is the system class loader used to load classes under classpath . The application uses it to load classes by default.
  • Custom class loader, used to load classes under custom paths.

The working principle of these class loaders is the same, the difference is that their loading paths are different, that is, the paths searched by the findClass method are different. The parent delegation mechanism is to ensure that a Java class is unique in the JVM. If you accidentally write a class with the same name as a JRE core class, such as Object class, the parent delegation mechanism can ensure that the Object class in JRE is loaded instead of Object class you wrote. This is because when AppClassLoader loads your Object class, it will delegate to ExtClassLoader to load it, and ExtClassLoader will delegate to BootstrapClassLoader . BootstrapClassLoader finds that it has already loaded Object class and will return directly without loading Object class you wrote. At most we can get ExtClassLoader , please note here.

3.2. Tomcat hot loading

Tomcat essentially performs periodic tasks through a background thread, regularly detecting changes in class files and reloading classes if any changes are found. Let's take a look at how ContainerBackgroundProcessor is implemented.

protected class ContainerBackgroundProcessor implements Runnable {

    @Override
    public void run() {
        // Please note that the parameter passed in here is the instance of "host class" processChildren(ContainerBase.this);
    }

    protected void processChildren(Container container) {
        try {
            //1. Call the backgroundProcess method of the current container.
            container.backgroundProcess();

            //2. Traverse all child containers and recursively call processChildren,
            // In this way, all descendants of the current container will be processed Container[] children = container.findChildren();
            for (int i = 0; i < children.length; i++) {
            // Please note here that the container base class has a variable called backgroundProcessorDelay. If it is greater than 0, it means that the child container has its own background thread and there is no need for the parent container to call its processChildren method.
                if (children[i].getBackgroundProcessorDelay() <= 0) {
                    processChildren(children[i]);
                }
            }
        } catch (Throwable t) { ... }

Tomcat's hot loading is implemented in the Context container, mainly by calling the reload method of the Context container. Putting aside the details, from a macro perspective, the main tasks are as follows:

  • Stop and destroy the Context container and all its child containers. The child container is actually the Wrapper, which means that the Servlet instance in the Wrapper is also destroyed.
  • Stop and destroy the Listener and Filter associated with the Context container.
  • Stop and destroy the Pipeline and various Valves under the Context.
  • Stop and destroy the Context's class loader and the class file resources loaded by the class loader.
  • Start the Context container, during which the resources destroyed in the previous four steps will be recreated.

In this process, class loaders play a key role. A Context container corresponds to a class loader. When the class loader is destroyed, all the classes it has loaded will also be destroyed. During the startup process, the Context container creates a new class loader to load new class files.

3.3. Tomcat Class Loader

Tomcat's custom class loader WebAppClassLoader breaks the parent delegation mechanism. It first tries to load a class itself. If it cannot find the class, it delegates it to the parent class loader. Its purpose is to give priority to loading classes defined by the Web application itself. The specific implementation is to rewrite two methods of ClassLoader : findClass and loadClass .

findClass method

org.apache.catalina.loader.WebappClassLoaderBase#findClass;

For ease of understanding and reading, I removed some details:

public Class<?> findClass(String name) throws ClassNotFoundException {
    ...

    Class<?> clazz = null;
    try {
            //1. First search for the class in the Web application directory clazz = findClassInternal(name);
    } catch (RuntimeException e) {
           throw e;
       }

    if (clazz == null) {
    try {
            //2. If not found in the local directory, let the parent loader search clazz = super.findClass(name);
    } catch (RuntimeException e) {
           throw e;
       }

    //3. If the parent class is not found, throw ClassNotFoundException
    if (clazz == null) {
        throw new ClassNotFoundException(name);
     }

    return clazz;
}

1. First search for the class to be loaded in the local directory of the Web application.

2. If not found, it is handed over to the parent loader for search. Its parent loader is the system class loader AppClassLoader mentioned above.

3. If the parent loader also cannot find the class, a ClassNotFound exception is thrown.

loadClass method

Let's look at the implementation of loadClass method of the Tomcat class loader. I have also removed some details:

public Class<?> loadClass(String name, boolean resolve) throws ClassNotFoundException {

    synchronized (getClassLoadingLock(name)) {

        Class<?> clazz = null;

        //1. First check in the local cache whether the class has been loaded clazz = findLoadedClass0(name);
        if (clazz != null) {
            if (resolve)
                resolveClass(clazz);
            return clazz;
        }

        //2. Check whether the class has been loaded from the system class loader's cache clazz = findLoadedClass(name);
        if (clazz != null) {
            if (resolve)
                resolveClass(clazz);
            return clazz;
        }

        // 3. Try to load the class using ExtClassLoader class loader, why?
        ClassLoader javaseLoader = getJavaseClassLoader();
        try {
            clazz = javaseLoader.loadClass(name);
            if (clazz != null) {
                if (resolve)
                    resolveClass(clazz);
                return clazz;
            }
        } catch (ClassNotFoundException e) {
            // Ignore
        }

        // 4. Try to search for the class in the local directory and load it try {
            clazz = findClass(name);
            if (clazz != null) {
                if (resolve)
                    resolveClass(clazz);
                return clazz;
            }
        } catch (ClassNotFoundException e) {
            // Ignore
        }

        // 5. Try to use the system class loader (that is, AppClassLoader) to load try {
                clazz = Class.forName(name, false, parent);
                if (clazz != null) {
                    if (resolve)
                        resolveClass(clazz);
                    return clazz;
                }
            } catch (ClassNotFoundException e) {
                // Ignore
            }
       }

    //6. The above processes all fail to load and throw an exception throw new ClassNotFoundException(name);
}

There are six main steps:

1. First check in the local cache whether the class has been loaded, that is, whether Tomcat's class loader has loaded this class.

2. If the Tomcat class loader has not loaded this class, check whether the system class loader has loaded it.

3. If none of them exists, let ExtClassLoader load it. This step is critical to prevent the Web application's own classes from overwriting the JRE's core classes. Because Tomcat needs to break the parent delegation mechanism, if a class called Object is customized in the Web application, if this Object class is loaded first, it will overwrite the Object class in the JRE. This is why Tomcat's class loader will try to use ExtClassLoader to load first, because ExtClassLoader will delegate to BootstrapClassLoader to load. BootstrapClassLoader finds that it has already loaded the Object class and directly returns it to Tomcat's class loader. In this way, Tomcat's class loader will not load the Object class under the Web application, thus avoiding the problem of overwriting the JRE core class.

4. If the ExtClassLoader loader fails to load, that is, there is no such class in JRE core class, then search and load it in the local Web application directory.

5. If the class does not exist in the local directory, it means that it is not a class defined by the Web application itself, and it will be loaded by the system class loader. Please note here that the Web application is handed over to the system class loader through Class.forName call, because the default loader of Class.forName is the system class loader.

6. If all the above loading processes fail, a ClassNotFound exception is thrown.

3.4. Tomcat class loader hierarchy

Tomcat, as a Servlet container, is responsible for loading our Servlet class. In addition, it is also responsible for loading the JAR package that Servlet depends on. And Tomcat itself is also a Java program, so it needs to load its own classes and dependent JAR packages. First, let us think about these questions:

1. Suppose we run two web applications in Tomcat, and there are Servlet with the same name in the two web applications, but with different functions. Tomcat needs to load and manage the two Servlet classes with the same name at the same time to ensure that they do not conflict. Therefore, the classes between web applications need to be isolated.

2. If two Web applications both depend on the same third-party JAR package, such as Spring , then after the Spring JAR package is loaded into memory, Tomcat must ensure that the two Web applications can share it, that is, the Spring JAR package is only loaded once. Otherwise, as the number of dependent third-party JAR packages increases, JVM memory will expand.

3. Like the JVM, we need to isolate the classes of Tomcat itself and the classes of the Web application.

1. WebAppClassLoader

Tomcat's solution is to customize a class loader WebAppClassLoader and create a class loader instance for each Web application. We know that the Context container component corresponds to a Web application, so each Context container is responsible for creating and maintaining a WebAppClassLoader loader instance. The rationale behind this is that classes loaded by different loader instances are considered to be different classes, even if they have the same class name. This is equivalent to creating mutually isolated Java class spaces inside the Java virtual machine. Each Web application has its own class space, and Web applications are isolated from each other through their own class loaders.

2. SharedClassLoader

The essential requirement is how to share library classes between two web applications and not load the same class repeatedly. In the parent delegation mechanism, each child loader can load classes through the parent loader, so isn't it enough to put the classes that need to be shared under the loading path of the parent loader?

Therefore, the designers of Tomcat added a class loader SharedClassLoader as the parent loader of WebAppClassLoader , which is specifically used to load classes shared between Web applications. If WebAppClassLoader itself does not load a class, it will delegate the parent loader SharedClassLoader to load the class. SharedClassLoader will load the shared class in the specified directory and then return it to WebAppClassLoader , so that the sharing problem is solved.

3. CatalinaClassloader

How to isolate Tomcat's own classes from the Web application's classes?

Sharing can be done through a parent-child relationship, while isolation requires a brotherly relationship. A sibling relationship means that two class loaders are parallel and they may have the same parent loader. Based on this, Tomcat designs another class loader, CatalinaClassloader , which is specifically used to load Tomcat's own classes.

There is a problem with this design. What should we do if some classes need to be shared between Tomcat and various Web applications?

The old method is to add another CommonClassLoader as the parent loader of CatalinaClassloader and SharedClassLoader . Classes that can be loaded by CommonClassLoader can be used by CatalinaClassLoader and SharedClassLoader

4. Summary of overall architecture design analysis

Through the previous study of Tomcat's overall architecture, we know what core components Tomcat has and the relationship between components. And how Tomcat handles an HTTP request. Let's review it through a simplified class diagram. From the diagram, you can see the hierarchical relationship of various components. The dotted line in the diagram represents the process of a request flowing through Tomcat.

4.1 Connectors

The overall architecture of Tomcat consists of two core components: connector and container. The connector is responsible for external communication, and the container is responsible for internal processing. The connector uses the ProtocolHandler interface to encapsulate the differences between communication protocols and I/O models. ProtocolHandler is internally divided into EndPoint and Processor modules. EndPoint is responsible for the underlying Socket communication, and Proccesor is responsible for application layer protocol parsing. The connector calls the container through the Adapter .

By studying the overall architecture of Tomcat, we can get some basic ideas for designing complex systems. First, we need to analyze the requirements and determine the sub-modules based on the principle of high cohesion and low coupling. Then, we need to find out the changing points and unchanging points in the sub-modules, use interfaces and abstract base classes to encapsulate the unchanging points, define template methods in the abstract base class, and let the subclasses implement the abstract methods themselves, that is, the specific subclasses implement the changing points.

4.2 Container

The combined mode is used to manage the container, and the startup events are published through the observer mode to achieve decoupling and open-closed principles. The skeleton abstract class and template method abstract changes and constants, and the changes are left to the subclasses to implement, thereby achieving code reuse and flexible expansion. Use the chain of responsibility approach to handle requests, such as logging.

4.3 Class Loader

Tomcat's custom class loader WebAppClassLoader breaks the parent delegation mechanism to isolate Web applications. It first tries to load a class itself. If it cannot find the class, it delegates it to the parent class loader. Its purpose is to prioritize loading classes defined by the Web application itself. To prevent the Web application's own classes from overwriting the JRE's core classes, use ExtClassLoader to load them, which breaks the parent delegation and can be loaded safely.

5. Practical application scenarios

The overall architecture design of Tomcat is briefly analyzed, from [connectors] to [containers], and the design ideas and design patterns of some components are explained in detail. The next step is how to apply what you have learned, and learn from the elegant design and apply it to actual work development. Learning begins with imitation.

5.1. Chain of Responsibility Model

At work, there is a requirement that users can input some information and choose to check the company's [Industrial and Commercial Information], [Judicial Information], [China Registration Status], etc., one or more modules as shown below, and there are some common things between modules that need to be reused by each module.

This is like a request, which will be processed by multiple modules. Therefore, we can abstract each query module into a processing valve, and use a List to save these valves. In this way, when adding a new module, we only need to add a new valve, realizing the open and close principle. At the same time, a bunch of verification codes are decoupled into different specific valves, and abstract classes are used to extract "unchanged" functions.

The specific sample code is as follows:

First, we abstract our processing valve. NetCheckDTO is the request information.

/**
 * Chain of responsibility pattern: handle each module valve */
public interface Valve {
    /**
     * Call * @param netCheckDTO
     */
    void invoke(NetCheckDTO netCheckDTO);
}

Define abstract base classes to reuse code.

public abstract class AbstractCheckValve implements Valve {
    public final AnalysisReportLogDO getLatestHistoryData(NetCheckDTO netCheckDTO, NetCheckDataTypeEnum checkDataTypeEnum){
        // Get history records, code logic omitted}

    // Get the verification data source configuration public final String getModuleSource(String querySource, ModuleEnum moduleEnum){
       // Omit code logic}
}

Define the business logic of each module, such as the processing of [Baidu negative news]

@Slf4j
@Service
public class BaiduNegativeValve extends AbstractCheckValve {
    @Override
    public void invoke(NetCheckDTO netCheckDTO) {

    }
}

The last step is to manage the modules that users choose to check, and we save them through List. Used to trigger the required inspection module

@Slf4j
@Service
public class NetCheckService {
    // Inject all valves @Autowired
    private Map<String, Valve> valveMap;

    /**
     * Send verification request *
     * @param netCheckDTO
     */
    @Async("asyncExecutor")
    public void sendCheckRequest(NetCheckDTO netCheckDTO) {
        // Module valves used to save customer selection processing List<Valve> valves = new ArrayList<>();

        CheckModuleConfigDTO checkModuleConfig = netCheckDTO.getCheckModuleConfig();
        // Add the module selected by the user to the valve chain if (checkModuleConfig.getBaiduNegative()) {
            valves.add(valveMap.get("baiduNegativeValve"));
        }
        // Omit some code.......
        if (CollectionUtils.isEmpty(valves)) {
            log.info("The network inspection module is empty, there is no task to be inspected");
            return;
        }
        // Trigger processing valves.forEach(valve -> valve.invoke(netCheckDTO));
    }
}

5.2 Template Method Pattern

The requirement is to perform financial report analysis based on the financial report Excel data or company name entered by the customer.

For non-listed products, parse Excel -> verify whether the data is legal -> perform calculations.

Listed companies: Determine whether the name exists. If not, send an email and terminate the calculation -> Pull financial report data from the database, initialize the inspection log, generate a report record, trigger the calculation -> Modify the task status based on failure or success.

The important "change" and "unchange"

  • What remains unchanged is that the entire process is to initialize the inspection log, initialize a report, verify the data in advance (if the listed company fails the verification, it is also necessary to build and send email data), pull financial report data from different sources and adapt to common data, and then trigger the calculation. Both task exceptions and successes require modifying the status.
  • What has changed is that the verification rules for listed and unlisted companies are different, and the methods of obtaining financial report data are different. The financial report data of the two methods need to be adapted.

The entire algorithm process is a fixed template, but the specific implementation of some changes within the algorithm needs to be deferred to different subclasses. This is the best scenario for the template method pattern.

public abstract class AbstractAnalysisTemplate {
    /**
     * Submit financial report analysis template method and define the skeleton process * @param reportAnalysisRequest
     * @return
     */
    public final FinancialAnalysisResultDTO doProcess(FinancialReportAnalysisRequest reportAnalysisRequest) {
        FinancialAnalysisResultDTO analysisDTO = new FinancialAnalysisResultDTO();
		// Abstract method: submit legal verification boolean prepareValidate = prepareValidate(reportAnalysisRequest, analysisDTO);
        log.info("prepareValidate validation result = {} ", prepareValidate);
        if (!prepareValidate) {
			// Abstract method: Build the data needed for notification emails buildEmailData(analysisDTO);
            log.info("Build email information, data = {}", JSON.toJSONString(analysisDTO));
            return analysisDTO;
        }
        String reportNo = FINANCIAL_REPORT_NO_PREFIX + reportAnalysisRequest.getUserId() + SerialNumGenerator.getFixLenthSerialNumber();
        // Generate analysis log initFinancialAnalysisLog(reportAnalysisRequest, reportNo);
		// Generate analysis record initAnalysisReport(reportAnalysisRequest, reportNo);

        try {
            // Abstract method: pull financial report data, different subclasses implement FinancialDataDTO financialData = pullFinancialData(reportAnalysisRequest);
            log.info("Financial report data fetching completed, ready to perform calculations");
            // Calculation indicators financialCalcContext.calc(reportAnalysisRequest, financialData, reportNo);
			// Set the analysis log to success successCalc(reportNo);
        } catch (Exception e) {
            log.error("An exception occurred in the financial report calculation subtask", e);
			// Set the analysis log to failCalc(reportNo);
            throw e;
        }
        return analysisDTO;
    }
}

Finally, create two subclasses to inherit the template and implement the abstract method. This decouples the processing logic of listed and non-listed types while reusing the code.

5.3 Strategy Pattern

The requirement is to make an Excel interface that can universally identify bank statements. Assume that the standard statement contains fields such as [transaction time, income, expenditure, transaction balance, payer account number, payer name, payee name, payee account number]. Now we parse out the subscript of the excel header where each necessary field is located. But there are many situations of water flow:

1. One is to include all standard fields.

2. The subscripts of income and expenditure are in the same column, and income and expenditure are distinguished by positive and negative numbers.

3. Income and expenses are in the same column, with a transaction type field to distinguish them.

4. Special treatment for special banks.

That is, we need to find the corresponding processing logic algorithm based on the corresponding subscript of the analysis. We may write a lot of if else codes in one method, and the entire pipeline processing is coupled together. If a new pipeline type comes in the future, we will have to continue to modify the old code. In the end, the code complexity may be "long, ugly and difficult to maintain".

At this time, we can use the strategy mode to process the pipelines of different templates with different processors, and find the corresponding strategy algorithm to process according to the template. Even if we add another type in the future, we only need to add a new processor, which has high cohesion, low coupling and is scalable.

Define the processor interface and use different processors to implement the processing logic. Inject all processors into data_processor_map of BankFlowDataHandler , and take out the processing flow of the existing processors according to different scenarios.

public interface DataProcessor {
    /**
     * Processing flow data * @param bankFlowTemplateDO Flow index data * @param row
     * @return
     */
    BankTransactionFlowDO doProcess(BankFlowTemplateDO bankFlowTemplateDO, List<String> row);

    /**
     * Whether the template can be processed. Different types of flow strategies can determine whether parsing is supported based on the template data. * @return
     */
    boolean isSupport(BankFlowTemplateDO bankFlowTemplateDO);
}

//Processor context @Service
@Slf4j
public class BankFlowDataContext {
    //Inject all processors into the map @Autowired
    private List<DataProcessor> processors;

    // Find the corresponding processor to process the pipeline public void process() {
         DataProcessor processor = getProcessor(bankFlowTemplateDO);
      	 for(DataProcessor processor : processors) {
           if (processor.isSupport(bankFlowTemplateDO)) {
             // row is a row of flow data processor.doProcess(bankFlowTemplateDO, row);
             break;
           }
         }

    }


}

Define the default processor to process normal templates. To add a new template, just add a new processor to implement DataProcessor .

/**
 *Default processor: facing the standard pipeline template*
 */
@Component("defaultDataProcessor")
@Slf4j
public class DefaultDataProcessor implements DataProcessor {

    @Override
    public BankTransactionFlowDO doProcess(BankFlowTemplateDO bankFlowTemplateDO) {
        // Omit the processing logic details return bankTransactionFlowDO;
    }

    @Override
    public String strategy(BankFlowTemplateDO bankFlowTemplateDO) {
      // Omit the judgment of whether the pipeline can be parsed boolean isDefault = true;

      return isDefault;
    }
}

Through the strategy pattern, we assign different processing logics to different processing classes, which is completely decoupled and easy to expand.

Use the embedded Tomcat method to debug the source code: GitHub: https://github.com/UniqueDong/tomcat-embedded

The above is the detailed content of analyzing Tomcat's architectural principles to architectural design. For more information about Tomcat's architectural principles and architectural design, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • A solution to the problem of invalid character encoding filter based on tomcat8
  • Detailed explanation of Tomcat core components and application architecture
  • Solution for Tomcat to place configuration files externally
  • Tomcat source code analysis of Web requests and processing
  • Detailed explanation of Tomcat's commonly used filters

<<:  The homepage design best reflects the level of the web designer

>>:  Detailed explanation of Vue custom instructions

Recommend

The perfect solution to the Chinese garbled characters in mysql6.x under win7

1. Stop the MySQL service in the command line: ne...

Vue uses rules to implement form field validation

There are many ways to write and validate form fi...

Full-screen drag upload component based on Vue3

This article mainly introduces the full-screen dr...

Summary of three ways to implement ranking in MySQL without using order by

Assuming business: View the salary information of...

Solution to MySQLSyntaxErrorException when connecting to MySQL using bitronix

Solution to MySQLSyntaxErrorException when connec...

IDEA complete code to connect to MySQL database and perform query operations

1. Write a Mysql link setting page first package ...

How to draw the timeline with vue+canvas

This article example shares the specific code of ...

How to view the execution time of SQL statements in MySQL

Table of contents 1. Initial SQL Preparation 2. M...

How to view and execute historical commands in Linux

View historical commands and execute specified co...

Implementation code of short video (douyin) watermark removal tool

Table of contents 1. Get the first link first 2. ...