Some summary of html to pdf conversion cases (multiple pictures recommended)

Some summary of html to pdf conversion cases (multiple pictures recommended)

Due to work requirements, I recently spent some time researching the function of converting HTML to PDF. The key technology of converting HTML to PDF is how to handle the complex CSS styles in web pages. By collecting information online, it is found that the current solutions for converting HTML to PDF are mainly divided into three categories:

Client mode : The front and back ends call the client program and use the functions of the client program to complete the PDF file conversion. The test tools are: wkhtmltopdf and PhantomJS. Java jar package parsing class mode: Java code parses CSS style and translates HTML file into PDF file. The test classes are: IText, Flying Sauser, PD4ML. js front-end parsing mode: js front-end parses html files into pdf files. The test cases this time are: html2canvas.

This time, we tested the solutions introduced online one by one in combination with the needs of actual projects, and made the following analysis in terms of performance and functionality.

1. Test page introduction

By looking at the introduction of various conversion cases on the Internet, simple HTML styles and general table styles are supported when converting PDF files. However, considering the actual business needs, this test specifically used the CSS style of Bootstrap (v 3.3.6), and the page also applied the new features of CSS3. Based on this new feature, write a static HTML page. The display effect of the HTML page in the browser is as follows:

2.wkhtmltopdf test

wkhtmltopdf is a tool developed using the webkit web rendering engine to convert html to pdf. It can be integrated with multiple scripting languages ​​to convert documents. Official website address http://wkhtmltopdf.org/

Technical features: Wkhtmltopdf can directly convert the web page browsed in the browser into a pdf. It is a software that converts html pages into pdf (needs to be installed on the server). When in use, you can call the cmd command through java code to complete the function of converting the web page to pdf.

Functional test: directly enter the test command in cmd to view the processing progress.

The first parameter: the path where wkhtmltopdf.exe is located

The second parameter: the HTML page that needs to be converted to PDF

The third parameter: pdf file path and file name

The page export effect is as follows:

Test description:

Through testing, we found that wkhtmltopdf has good overall support for bootstap's CSS style. There is poor support for new CSS3 features such as circular image styles. Some page styles will be invalid. For chart display, the Eachart chart export program will report an error and is not supported. However, echart has an interface for converting charts into pictures, which can be exported to PDF by obtaining the picture address.

3. PhantomJS Testing

PhantomJS is a headless browser based on the webkit kernel, that is, it has no UI interface, that is, it is a browser, but the human-related operations such as clicking and page turning require programming implementation. It provides a javascript API interface, that is, by writing JS programs, you can directly interact with the webkit kernel. On top of this, you can combine the java language, etc., and call js and other related operations through java, thus solving the limitation that only c/c++ could be used to develop high-quality collectors based on webkit. It also provides installation packages for different operating systems such as Windows, Linux, and Mac, which means that secondary development of collection projects or automatic project testing can be carried out on different platforms. Official website address: http://phantomjs.org/

PhantomJS can be used for web page analysis and has many functions. This time we only use the screenshot function of the web page. The test in cmd is as follows:

The test page export effect is as follows:

Test description:

Through testing, it is found that PhantomJS has good support for bootstap styles. There is poor support for new CSS3 features such as circular image styles. Some page styles will be invalid. For echart chart display, it can also be exported directly. The effect is as follows:

3. IText and Flying Sauser

IText implements html2pdf, which has fast speed but poor error correction capability. It supports Chinese (requires HTML to use unicode encoding), but only supports one Chinese font. It is open source. Flying Sauser implements html2pdf, has poor error correction capabilities, supports multiple Chinese fonts (some styles cannot be recognized), and is open source.

Technical features: Parse and process HTML's CSS styles based on Java programming. Currently, it only supports simpler pages and styles. The compatibility with CSS3 styles and complex associated CSS styles is extremely poor. When the page content is long, the processing time is slow. Reference address: https://code.google.com/archive/p/flying-saucer/

Test results: The test page of this experiment cannot be displayed. The effect of the normal test page is as follows:

Test description:

Through testing, it was found that the two open source projects, IText and Flying Sauser, are basically not compatible with CSS3. After consulting the information, it was found that this technology is relatively old and this open source project is not updated or maintained. For simple tables and statistical data export, newer technologies include bootstrap table and easyui datagrid table export. This solution introduced on the Internet is not recommended.

4.PD4ML test

PD4ML is a pure Java class library that uses HTML and CSS as page layout and content definition formats to generate PDF documents. It can simplify the work of generating PDFs for end users. Reference website: http://www.pd4ml.com

The advantages of this software are:

It supports a relatively complete range of HTML tags and CSS attributes, with relatively small conversion distortion, and precise layout control can be achieved using HTML+CSS. It has better tolerance for errors in web page file tags and CSS syntax. It supports image conversion and output without additional control.

The disadvantages of this software are:

Not open source, the latest demo version, after downloading and testing, it was found that it does not support Chinese conversion. You must purchase the commercial version. (This is a tricky part. The garbled code problem failed to pass the test. Later I found out that it was not supported in the first place.) Some old versions after cracking can solve the garbled code problem, but the supported CSS styles are not as complete as the new versions.

Test results:

Test description:

The new version has garbled Chinese characters, but supports some CSS styles. After the old version is deciphered, the interface style compatibility is poor and the support for bootsrtap is low. It can basically output a data and display pictures without any problems. Considering that it is a paid software and its performance is not perfect, it is not recommended to use template export or other tools to export ordinary pages.

5.html2canvas test

Html2canvas is a very good JavaScript library, which uses some new features of HTML5 and CSS3 to realize the function of taking screenshots of web pages on the client side. html2canvas obtains the DOM and element style information of the page and renders it into a canvas image, thereby realizing the function of taking a screenshot of the page. It does not require any rendering from the server, the entire image is created on the client browser. When the browser does not support Canvas, Flashcanvas or ExplorerCanvas technology will be used instead. The following browsers can support this script well: Firefox 3.5+, Google Chrome, Opera new version, IE9 and above. Because each browser renders pages differently, the resulting images are also different. Although it is still in the development stage, it is still worth looking forward to. This plugin depends on the jQuery plugin, and it is recommended to use the latest version.

Does not support cross-domain images. Cannot be used in browser plug-ins. Some browsers do not support SVG images. Does not support Flash. Does not support ifream (you can modify the original js code to support ifream).

When using html2canvas for testing, I found that many project pages can be screenshoted normally, including echart charts. Only a few new CSS3 features are not supported. The screenshot effect is better. However, when testing the application, a fatal problem was discovered. After the page module called html2canvas to take a screenshot, it was found that part of the CSS of the original page suddenly became invalid. After tracing and analysis, it was found that the js function of html2canvas processed the CSS style that it could not recognize. Especially for hiding and showing modules, the support is not friendly.

The page screenshot effect is as follows:

However, the CSS of the original page is invalid, the page behaves abnormally, some styles are hidden, and the displayed styles are chaotic.

Test description:

Through testing, it is found that html2canvas has good support for bootstap styles. There is poor support for new CSS3 features such as circular image styles. Its main advantage is that it has a light front-end. To change the style of the original page, you can export the image first and then refresh the page.

6. Summary

Through testing the above cases, we can find that most of the commonly used methods of converting HTML to PDF introduced on the Internet are simple HTML conversions that can be used, but in actual applications, there are still many problems and they are difficult to apply. By analyzing the implementation principles of these methods, the following conclusions can be drawn:

All solutions have shortcomings in completely converting html pages to pdf. If it is only a part of the form page, try not to use CSS3 attributes in the HTML style, and use client mode and html2canvas for processing. The front-end style of HTML is developing rapidly, the new features of CSS3 are effective, and CSS defines new rules and syntax. Java conversion classes such as IText and Flying Sauser are not compatible with these changes at all, because it is impossible to write conversion functions in time, and these open source projects are older technologies, and the later open source teams have stopped maintaining and updating them. PD4ML is essentially a Java-based style conversion for CSS. It is commercial software and has team support for CSS3 compatibility. It is more powerful in performance and functionality than IText and Flying Sauser. But it does not support a few CSS styles well. And it is not easy to solve the problem of garbled Chinese characters. Regarding the client browser kernel mode, PhantomJS is more powerful than wkhtmltopdf. Screenshot is just one of its small functions. It can also be used for web page analysis. It is recommended to use PhantomJS. The screenshot mode of html2canvas is flexible and it is a lightweight front-end screenshot tool. At present, some functions are incomplete, but the overall effect is good. To solve the problem that some screenshots affect the original page, you can save the screenshot successfully and then refresh the page once to achieve the effect of exporting the screenshot to PDF.

The above is the full content of this article. I hope that the content of this article can bring some help to your study or work. If you have any questions, you can leave a message to communicate. Thank you for your support of 123WORDPRESS.COM!

<<:  CSS implements five common 2D transformations

>>:  What are the similarities between the development of web design and western architecture?

Recommend

Tutorial on installing mongodb under linux

MongoDB is cross-platform and can be installed on...

Introduction to using MySQL commands to create, delete, and query indexes

MySQL database tables can create, view, rebuild a...

MySQL online deadlock analysis practice

Preface I believe that everyone has had a simple ...

Setting up shared folders in Ubuntu virtual machine of VMWare14.0.0

This is my first blog post. Due to time constrain...

Analysis of the process of building a LAN server based on http.server

I don’t know if you have ever encountered such a ...

Usage of Linux userdel command

1. Command Introduction The userdel (user delete)...

How to deploy Node.js with Docker

Preface Node will be used as the middle layer in ...

Simple example of HTML text formatting (detailed explanation)

1. Text formatting: This example demonstrates how...

The process of SSH service based on key authentication in Linux system

As we all know, SSH is currently the most reliabl...

VMware workstation 12 install Ubuntu 14.04 (64 bit)

1. Installation Environment Computer model: Lenov...

How to write transparent CSS for images using filters

How to write transparent CSS for images using filt...

Solution to the img tag problem below IE10

Find the problem I wrote a simple demo before, bu...

Definition and function of zoom:1 attribute in CSS

Today I was asked what the zoom attribute in CSS ...