Notes on exporting HTML document with images to Word on Linux
We have a web application (under Ubuntu webserver) with a WYSIWYG html editor which let users to create a report template with custom tags. These custom tags are then resolved to technical graph images and embedded into the report. This application allows users to
- preview the report with graphs on the web browsers
- schedule to generate the report which then export to Word document and deliver via email attachment.
In order to generate the HTML document, the obvious choice is to format the IMG tag with embedded base64 image content, especially for preview purpose. So that we can easily send the whole HTML document to client web browsers without worrying how to resolve the IMG src paths under the document root. However, this approach raises another issue. Currently, none of the convert tools in Linux such as Abiword, Libreoffice, OpenOffice, wkhtmltopdf can fully export HTML document with embedded images, i.e. the images are missing when opened in MS Word (even directly open the HTML document in MS Word, it won’t show any images). This type of HTML documents is only supported by the web browsers.
The only alternative is to construct the HTML document with IMG tags in relative path. This approach works much better with Abiword that the exported Word document shows the images in MS Word. Even MS Word can display this type of HTML document with images.
As a result, we create two versions of HTML documents with different type of IMG tags. We use embedded IMG for preview purpose and IMG src link for export to other documents and archive purposes.