Automate word to PDF conversion in python using LibreOffice

docx_to_pdf_libreoffice

Automate word to PDF conversion in python using LibreOffice

Table of Contents


  • LibreOffice provides a command line interface to to convert word files to PDF
  • The conversion commands can be run in python scripts using the subprocess python module

Windows

Installation

Conversion command

"C:\\Program Files\\LibreOffice\\program\\swriter.exe" --headless --convert-to pdf --outdir "C:\\Users\\Abcd\\Documents\\liber_test\\out" "C:\\Users\\Abcd\\Documents\\liber_test\\in\\test1.docx"

Ubuntu

Installation

  • Install from command line using the following commands
# install packages
sudo apt update

# install java runtime if not present. 
# Java installation can be verified using "java -version" command
# libreoffice-java-common may be required if you get warning like "Warning: failed to launch javaldx - java may not function correctly"
sudo apt install default-jre libreoffice-java-common

# install libreoffice for word to pdf conversion purpose
sudo apt install libreoffice --no-install-recommends

Conversion command

libreoffice --headless --convert-to pdf "/home/james/in/test1.docx" --outdir "/home/james/out/"

Python code in windows

# LibreOffice command line - https://help.libreoffice.org/latest/he/text/shared/guide/start_parameters.html

import subprocess

documentPath = r"C:\\Users\\Abcd\\Documents\\Python Projects\\taming_python\\liber_pdf_convert\\in\\test1.docx"
outFolder = r"C:\\Users\\Abcd\\Documents\\Python Projects\\taming_python\\liber_pdf_convert\\out"

# if running in Ubuntu, libreOfficePath = "libreoffice"
libreOfficePath = r"C:\\Program Files\\LibreOffice\\program\\swriter.exe"

commandStrings = [libreOfficePath, "--headless", "--convert-to", "pdf", "--outdir", outFolder, documentPath]
# print(" ".join(commandStrings))

retCode = subprocess.call(commandStrings)
# print(retCode)

if retCode == 0:
    print("PDF conversion completed!")
else:
    print(f"Looks like there is an error in pdf conversion process with return code {retCode}")

  • The output PDF file name cannot be controlled. So if required, the output file can be renamed as per requirement separately.
  • The output folder and the input file paths can also be relative paths like .\\out and .\\in\\test1.docx
  • use soffice.com instead of swriter.exe if you want to display the LibreOffice output in command line

Python code in Ubuntu

# LibreOffice command line - <https://help.libreoffice.org/latest/he/text/shared/guide/start_parameters.html>

import subprocess

documentPath = "/home/james/libre_test/in/test1.docx"
outFolder = "/home/james/libre_test/out"

# if running in Ubuntu, libreOfficePath = "libreoffice"
libreOfficePath = "libreoffice"

commandStrings = [libreOfficePath, "--headless", "--convert-to", "pdf", "--outdir", f"{outFolder}", f"{documentPath}"]
# print(" ".join(commandStrings))

retCode = subprocess.call(commandStrings)
# print(retCode)

if retCode == 0:
    print("PDF conversion completed!")
else:
    print(f"Looks like there is an error in pdf conversion process with return code {retCode}")

  • The output folder and the input file paths can also be relative paths like ./out and ./in/test1.docx

Video

Video for this post can be found here

References


Table of Contents

Comments