Due to work requirements, I recently spent some time researching the function of converting HTML to PDF. The key technology of converting HTML to PDF is how to handle the complex CSS styles in web pages. By collecting information online, it is found that the current solutions for converting HTML to PDF are mainly divided into three categories: Client mode : The front and back ends call the client program and use the functions of the client program to complete the PDF file conversion. The test tools are: wkhtmltopdf and PhantomJS. Java jar package parsing class mode: Java code parses CSS style and translates HTML file into PDF file. The test classes are: IText, Flying Sauser, PD4ML. js front-end parsing mode: js front-end parses html files into pdf files. The test cases this time are: html2canvas. This time, we tested the solutions introduced online one by one in combination with the needs of actual projects, and made the following analysis in terms of performance and functionality. 1. Test page introduction By looking at the introduction of various conversion cases on the Internet, simple HTML styles and general table styles are supported when converting PDF files. However, considering the actual business needs, this test specifically used the CSS style of Bootstrap (v 3.3.6), and the page also applied the new features of CSS3. Based on this new feature, write a static HTML page. The display effect of the HTML page in the browser is as follows: 2.wkhtmltopdf test wkhtmltopdf is a tool developed using the webkit web rendering engine to convert html to pdf. It can be integrated with multiple scripting languages to convert documents. Official website address http://wkhtmltopdf.org/ Technical features: Wkhtmltopdf can directly convert the web page browsed in the browser into a pdf. It is a software that converts html pages into pdf (needs to be installed on the server). When in use, you can call the cmd command through java code to complete the function of converting the web page to pdf. Functional test: directly enter the test command in cmd to view the processing progress. The first parameter: the path where wkhtmltopdf.exe is located The second parameter: the HTML page that needs to be converted to PDF The third parameter: pdf file path and file name The page export effect is as follows: Test description: Through testing, we found that wkhtmltopdf has good overall support for bootstap's CSS style. There is poor support for new CSS3 features such as circular image styles. Some page styles will be invalid. For chart display, the Eachart chart export program will report an error and is not supported. However, echart has an interface for converting charts into pictures, which can be exported to PDF by obtaining the picture address. 3. PhantomJS Testing PhantomJS is a headless browser based on the webkit kernel, that is, it has no UI interface, that is, it is a browser, but the human-related operations such as clicking and page turning require programming implementation. It provides a javascript API interface, that is, by writing JS programs, you can directly interact with the webkit kernel. On top of this, you can combine the java language, etc., and call js and other related operations through java, thus solving the limitation that only c/c++ could be used to develop high-quality collectors based on webkit. It also provides installation packages for different operating systems such as Windows, Linux, and Mac, which means that secondary development of collection projects or automatic project testing can be carried out on different platforms. Official website address: http://phantomjs.org/ PhantomJS can be used for web page analysis and has many functions. This time we only use the screenshot function of the web page. The test in cmd is as follows: The test page export effect is as follows: Test description: Through testing, it is found that PhantomJS has good support for bootstap styles. There is poor support for new CSS3 features such as circular image styles. Some page styles will be invalid. For echart chart display, it can also be exported directly. The effect is as follows: 3. IText and Flying Sauser IText implements html2pdf, which has fast speed but poor error correction capability. It supports Chinese (requires HTML to use unicode encoding), but only supports one Chinese font. It is open source. Flying Sauser implements html2pdf, has poor error correction capabilities, supports multiple Chinese fonts (some styles cannot be recognized), and is open source. Technical features: Parse and process HTML's CSS styles based on Java programming. Currently, it only supports simpler pages and styles. The compatibility with CSS3 styles and complex associated CSS styles is extremely poor. When the page content is long, the processing time is slow. Reference address: https://code.google.com/archive/p/flying-saucer/ Test results: The test page of this experiment cannot be displayed. The effect of the normal test page is as follows: Test description: Through testing, it was found that the two open source projects, IText and Flying Sauser, are basically not compatible with CSS3. After consulting the information, it was found that this technology is relatively old and this open source project is not updated or maintained. For simple tables and statistical data export, newer technologies include bootstrap table and easyui datagrid table export. This solution introduced on the Internet is not recommended. 4.PD4ML test PD4ML is a pure Java class library that uses HTML and CSS as page layout and content definition formats to generate PDF documents. It can simplify the work of generating PDFs for end users. Reference website: http://www.pd4ml.com The advantages of this software are: It supports a relatively complete range of HTML tags and CSS attributes, with relatively small conversion distortion, and precise layout control can be achieved using HTML+CSS. It has better tolerance for errors in web page file tags and CSS syntax. It supports image conversion and output without additional control. The disadvantages of this software are: Not open source, the latest demo version, after downloading and testing, it was found that it does not support Chinese conversion. You must purchase the commercial version. (This is a tricky part. The garbled code problem failed to pass the test. Later I found out that it was not supported in the first place.) Some old versions after cracking can solve the garbled code problem, but the supported CSS styles are not as complete as the new versions. Test results: Test description: The new version has garbled Chinese characters, but supports some CSS styles. After the old version is deciphered, the interface style compatibility is poor and the support for bootsrtap is low. It can basically output a data and display pictures without any problems. Considering that it is a paid software and its performance is not perfect, it is not recommended to use template export or other tools to export ordinary pages. 5.html2canvas test Html2canvas is a very good JavaScript library, which uses some new features of HTML5 and CSS3 to realize the function of taking screenshots of web pages on the client side. html2canvas obtains the DOM and element style information of the page and renders it into a canvas image, thereby realizing the function of taking a screenshot of the page. It does not require any rendering from the server, the entire image is created on the client browser. When the browser does not support Canvas, Flashcanvas or ExplorerCanvas technology will be used instead. The following browsers can support this script well: Firefox 3.5+, Google Chrome, Opera new version, IE9 and above. Because each browser renders pages differently, the resulting images are also different. Although it is still in the development stage, it is still worth looking forward to. This plugin depends on the jQuery plugin, and it is recommended to use the latest version. Does not support cross-domain images. Cannot be used in browser plug-ins. Some browsers do not support SVG images. Does not support Flash. Does not support ifream (you can modify the original js code to support ifream). When using html2canvas for testing, I found that many project pages can be screenshoted normally, including echart charts. Only a few new CSS3 features are not supported. The screenshot effect is better. However, when testing the application, a fatal problem was discovered. After the page module called html2canvas to take a screenshot, it was found that part of the CSS of the original page suddenly became invalid. After tracing and analysis, it was found that the js function of html2canvas processed the CSS style that it could not recognize. Especially for hiding and showing modules, the support is not friendly. The page screenshot effect is as follows: However, the CSS of the original page is invalid, the page behaves abnormally, some styles are hidden, and the displayed styles are chaotic. Test description: Through testing, it is found that html2canvas has good support for bootstap styles. There is poor support for new CSS3 features such as circular image styles. Its main advantage is that it has a light front-end. To change the style of the original page, you can export the image first and then refresh the page. 6. Summary Through testing the above cases, we can find that most of the commonly used methods of converting HTML to PDF introduced on the Internet are simple HTML conversions that can be used, but in actual applications, there are still many problems and they are difficult to apply. By analyzing the implementation principles of these methods, the following conclusions can be drawn: All solutions have shortcomings in completely converting html pages to pdf. If it is only a part of the form page, try not to use CSS3 attributes in the HTML style, and use client mode and html2canvas for processing. The front-end style of HTML is developing rapidly, the new features of CSS3 are effective, and CSS defines new rules and syntax. Java conversion classes such as IText and Flying Sauser are not compatible with these changes at all, because it is impossible to write conversion functions in time, and these open source projects are older technologies, and the later open source teams have stopped maintaining and updating them. PD4ML is essentially a Java-based style conversion for CSS. It is commercial software and has team support for CSS3 compatibility. It is more powerful in performance and functionality than IText and Flying Sauser. But it does not support a few CSS styles well. And it is not easy to solve the problem of garbled Chinese characters. Regarding the client browser kernel mode, PhantomJS is more powerful than wkhtmltopdf. Screenshot is just one of its small functions. It can also be used for web page analysis. It is recommended to use PhantomJS. The screenshot mode of html2canvas is flexible and it is a lightweight front-end screenshot tool. At present, some functions are incomplete, but the overall effect is good. To solve the problem that some screenshots affect the original page, you can save the screenshot successfully and then refresh the page once to achieve the effect of exporting the screenshot to PDF. The above is the full content of this article. I hope that the content of this article can bring some help to your study or work. If you have any questions, you can leave a message to communicate. Thank you for your support of 123WORDPRESS.COM! |
<<: CSS implements five common 2D transformations
>>: What are the similarities between the development of web design and western architecture?
MongoDB is cross-platform and can be installed on...
MySQL database tables can create, view, rebuild a...
1. Introduction to nmon Nmon (Nigel's Monitor...
Preface I believe that everyone has had a simple ...
This is my first blog post. Due to time constrain...
I don’t know if you have ever encountered such a ...
1. Command Introduction The userdel (user delete)...
Preface Node will be used as the middle layer in ...
1. Text formatting: This example demonstrates how...
As we all know, SSH is currently the most reliabl...
1. Installation Environment Computer model: Lenov...
How to write transparent CSS for images using filt...
Today I suddenly thought that the styles of check ...
Find the problem I wrote a simple demo before, bu...
Today I was asked what the zoom attribute in CSS ...