java转pdf_java 实现word 转PDF （采用第三方技术 IText、Poi、Jsoup）

先讲讲思路：

第一步：使用 poi 将word转换成 html，这里代码一搜一堆没什么好说的千篇一律。

(值得注意的地方是IText 根据html生成文件的是否会验证html文件是否标准，例如通过poi转换的出来的html文件的一些标签会缺少标签闭合，

举个例子：

这是我直接用poi生成的html中的一部分, META、img 标签明显就没有闭合标签。如果用这种html进行转换是没有办法通过itext 的校验的。会出现以下异常

错误： “The element type "meta" must be terminated by the matching end-tag "".”

org.xhtmlrenderer.util.XRRuntimeException: Can't load the XML resource (using TRaX transformer). org.xml.sax.SAXParseException: The element type "meta" must be terminated by the matching end-tag "". 。

从错误分析也知道是我们的html不规范拉，我们采用第三方 jar 包 Jsoup，直接调用parse方法我们的html就标准啦！

因为遇到这个问题让我头疼了半天，没想到就这么轻松的解决了，发个博文支援一下遇到该问题的小伙伴们！

下面是poi转换html 的代码：

package com.smart.sys.core.service.io.poi;

import org.apache.poi.hwpf.HWPFDocument;

import org.apache.poi.hwpf.converter.PicturesManager;

import org.apache.poi.hwpf.converter.WordToHtmlConverter;

import org.apache.poi.hwpf.usermodel.Picture;

import org.apache.poi.hwpf.usermodel.PictureType;

import org.jsoup.Jsoup;

import org.w3c.dom.Document;

import javax.xml.parsers.DocumentBuilderFactory;

import javax.xml.parsers.ParserConfigurationE

java转pdf_java 实现word 转PDF （采用第三方技术 IText、Poi、Jsoup）

悦读