How to detect whether a file is damaged using Apache Tika

How to detect whether a file is damaged using Apache Tika

Apache Tika is a library for file type detection and content extraction from files of various formats.

When uploading files to a server and parsing them, you often need to determine whether the files are damaged. We can use tika to detect whether the file is damaged

Maven is introduced as follows:

<dependency>
  <groupId>org.apache.tika</groupId>
  <artifactId>tika-app</artifactId>
  <version>1.18</version>
</dependency>
<dependency>
  <groupId>xerces</groupId>
  <artifactId>xercesImpl</artifactId>
  <version>2.11.0</version>
</dependency>

If there is a conflict in the jar packages, you can introduce them as follows:

<dependency>
  <groupId>org.apache.tika</groupId>
  <artifactId>tika-core</artifactId>
  <version>1.18</version>
</dependency>
<dependency>
  <groupId>org.apache.tika</groupId>
  <artifactId>tika-parsers</artifactId>
  <version>1.18</version>
</dependency>
<dependency>
  <groupId>xerces</groupId>
  <artifactId>xercesImpl</artifactId>
  <version>2.11.0</version>
</dependency>

Use tika to detect whether the file is damaged:

If reading from the input stream fails, the parse method throws an IOException. If the document obtained from the stream cannot be parsed, a TikaException is thrown. If the processor cannot handle the event, a SAXException is thrown.

When a document cannot be parsed, it indicates that the document is corrupted.

Execution process:

public static void main(String[] args) {
    try {
      //Assume sample.txt is in your current directory
      File file = new File("D:\\Test.txt");
      boolean result = isParseFile(file);
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
 
  /**
   * Verify if the file is corrupted*
   * @param file file * @return true/false
   * @throws Exception
   */
  private static boolean isParseFile(File file) throws Exception {
    try {
      Tika tika = new Tika();
      String filecontent = tika.parseToString(file);
      System.out.println(filecontent);
      return true;
    } catch (TikaException e) {
      return false;
    }
  }

Output:

Test data---read text content

Summarize

The above is the method of Apache Tika to detect whether the file is damaged. I hope it will be helpful to you. If you have any questions, please leave me a message and I will reply to you in time. I would also like to thank everyone for their support of the 123WORDPRESS.COM website!
If you find this article helpful, please feel free to reprint it and please indicate the source. Thank you!

You may also be interested in:
  • How to detect whether Apache mod_rewrite module is installed in PHP

<<:  Ant designing vue table to achieve a complete example of scalable columns

>>:  Mysql 8.0 installation and password reset issues

Recommend

Sample code for implementing markdown automatic numbering with pure CSS

The origin of the problem The first time I paid a...

Mysql 5.7.18 Using MySQL proxies_priv to implement similar user group management

Use MySQL proxies_priv (simulated role) to implem...

Echart Bar double column chart style most complete detailed explanation

Table of contents Preface Installation and Config...

MySQL 8.0.21 installation steps and problem solutions

Download the official website First go to the off...

Interpretation of 17 advertising effectiveness measures

1. 85% of ads go unread <br />Interpretatio...

A brief analysis of MySQL locks and transactions

MySQL itself was developed based on the file syst...

Summarize the commonly used nth-child selectors

Preface In front-end programming, we often use th...

A detailed introduction to setting up Jenkins on Tencent Cloud Server

Table of contents 1. Connect to Tencent Cloud Ser...

Linux uses iftop to monitor network card traffic in real time

Linux uses iftop to monitor the traffic of the ne...

How to solve the error "ERROR 1045 (28000)" when logging in to MySQL

Today, I logged into the server and prepared to m...

How to disable web page styles using Firefox's web developer

Prerequisite: The web developer plugin has been in...

Analysis of Apache's common virtual host configuration methods

1. Apache server installation and configuration y...

Detailed explanation of common methods of Vue development

Table of contents $nextTick() $forceUpdate() $set...

How to find identical files in Linux

As the computer is used, a lot of garbage will be...