Automating word files to pdf files conversion using docx2pdf python library

docx2pdf

Skill - Automating word files to pdf files conversion using docx2pdf python library

Table of Contents

Skills Required

Please make sure to have all the skills mentioned above to understand and execute the code mentioned below. Go through the above skills if necessary for reference or revision


  • docx2pdf is a python library for converting word file to pdf file
  • This helps in automating word files to pdf files in bulk and without manual intervention
  • docx2pdf can be used as a command line tool and also in python program

Install docx2pdf python module

  • Open command prompt
  • Run the command python -m pip install docx2pdf
  • Now docx2pdf should be installed in your python environment

Prerequisites

docx2pdf requires Microsoft Word to be installed in the computer

Python example

import docx2pdf
# convert "abcd.docx" file to pdf in the same folder
docx2pdf.convert("abcd.docx")

# convert "abcd.docx" file to pdf with custom name and folder location
docx2pdf.convert("abcd.docx", "out/abc.pdf")

# convert all the docx files in a folder. 
# Output files will also be in the same folder
docx2pdf.convert("inp/")

# convert all the docx files in a folder and keep the pdfs in another folder
docx2pdf.convert("inp/", "out/")

Command line interface

  • Convert a single word file to pdf
docx2pdf file1.docx

The above command converts file1.docx to file1.pdf

  • convert all word files in a folder to pdf files
docx2pdf folder1/

The above command converts all the word files in folder1 to pdf files

  • Convert single docx file and output to a different explicit folder:
docx2pdf inputFile.docx outputFolder/
  • Batch convert docx folder. Output PDFs will go to a different explicit folder:
docx2pdf inpFoldr/ outDir/

Issues while packaging the code to exe with pyinstaller

  • After packaging the code with pyinstaller you can get errors like importlib.metadata.PackageNotFoundError: docx2pdf. To resolve this issue, create a virtual environment, and goto the Lib\site-packages\docx2pdf\__init__.py file and remove the following lines in line 7 to 13
try:
     # 3.8+
     from importlib.metadata import version
except ImportError:
     from importlib_metadata import version

__version__ = version(__package__)

and remove line 119

        print(__version__)

Now again convert your script to exe file in your virtual environment, then the issue of package not found error should be resolved.

  • After you fixed the package not found error, you might get errors like [ImportError with cx_freeze and pywin32: Module 'pythoncom' isn't in frozen sys.path](https://stackoverflow.com/questions/17891071/importerror-with-cx-freeze-and-pywin32-module-pythoncom-isnt-in-frozen-sys-p)
    Then this might be due to the packages incompatibility, in such case use the following versions of pyinstaller, pywin32 and docx2pdf in your python environment
pyinstaller==4.3
pywin32==302
docx2pdf==0.1.7

After installing the above package versions in your python environment, again convert your script to exe file using pyinstaller, then the Import error should be resolved

Video

Video for this post can be found here

References


Table of Contents

Comments