How to detect whether a file is damaged using Apache Tika

How to detect whether a file is damaged using Apache Tika

Apache Tika is a library for file type detection and content extraction from files of various formats.

When uploading files to a server and parsing them, you often need to determine whether the files are damaged. We can use tika to detect whether the file is damaged

Maven is introduced as follows:

<dependency>
  <groupId>org.apache.tika</groupId>
  <artifactId>tika-app</artifactId>
  <version>1.18</version>
</dependency>
<dependency>
  <groupId>xerces</groupId>
  <artifactId>xercesImpl</artifactId>
  <version>2.11.0</version>
</dependency>

If there is a conflict in the jar packages, you can introduce them as follows:

<dependency>
  <groupId>org.apache.tika</groupId>
  <artifactId>tika-core</artifactId>
  <version>1.18</version>
</dependency>
<dependency>
  <groupId>org.apache.tika</groupId>
  <artifactId>tika-parsers</artifactId>
  <version>1.18</version>
</dependency>
<dependency>
  <groupId>xerces</groupId>
  <artifactId>xercesImpl</artifactId>
  <version>2.11.0</version>
</dependency>

Use tika to detect whether the file is damaged:

If reading from the input stream fails, the parse method throws an IOException. If the document obtained from the stream cannot be parsed, a TikaException is thrown. If the processor cannot handle the event, a SAXException is thrown.

When a document cannot be parsed, it indicates that the document is corrupted.

Execution process:

public static void main(String[] args) {
    try {
      //Assume sample.txt is in your current directory
      File file = new File("D:\\Test.txt");
      boolean result = isParseFile(file);
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
 
  /**
   * Verify if the file is corrupted*
   * @param file file * @return true/false
   * @throws Exception
   */
  private static boolean isParseFile(File file) throws Exception {
    try {
      Tika tika = new Tika();
      String filecontent = tika.parseToString(file);
      System.out.println(filecontent);
      return true;
    } catch (TikaException e) {
      return false;
    }
  }

Output:

Test data---read text content

Summarize

The above is the method of Apache Tika to detect whether the file is damaged. I hope it will be helpful to you. If you have any questions, please leave me a message and I will reply to you in time. I would also like to thank everyone for their support of the 123WORDPRESS.COM website!
If you find this article helpful, please feel free to reprint it and please indicate the source. Thank you!

You may also be interested in:
  • How to detect whether Apache mod_rewrite module is installed in PHP

<<:  Ant designing vue table to achieve a complete example of scalable columns

>>:  Mysql 8.0 installation and password reset issues

Recommend

In-depth explanation of Session and Cookie in Tomcat

Preface HTTP is a stateless communication protoco...

Detailed usage of Vue timer

This article example shares the specific code of ...

Detailed process of installing logstash in Docker

Edit docker-compose.yml and add the following con...

How to make spaces have the same width in IE and FF?

body{font-size:12px; font-family:"宋体";}...

SQL Server database error 5123 solution

Because I have a database tutorial based on SQL S...

Alibaba Cloud domain name and IP binding steps and methods

1 Enter the Alibaba Cloud console, find the domai...

An article tells you how to write a Vue plugin

Table of contents What is a plugin Writing plugin...

How to use filters to implement monitoring in Zabbix

Recently, when I was working on monitoring equipm...

Share 10 of the latest web front-end frameworks (translation)

In the world of web development, frameworks are ve...

A graphic tutorial on how to install MySQL in Windows

Abstract: This article mainly explains how to ins...

mysql row column conversion sample code

1. Demand We have three tables. We need to classi...

Detailed explanation of samba + OPENldap to build a file sharing server

Here I use samba (file sharing service) v4.9.1 + ...