Web Hosting

Step-by-Step Guide to Converting Word Files to PDF with Progress Bar in Python

This article provides a comprehensive guide on how to create a Python script that converts .docx files to .pdf format while displaying a progress bar for each file. The script also dynamically takes input from the user regarding folder paths and ensures smooth file conversion with feedback at each step.

Overview

The script makes use of two key libraries:
  1. docx2pdf - to convert Word files (.docx) to PDF.
  2. tqdm - to display a progress bar, giving visual feedback during file conversion.

Additionally, we’ll simulate progress updates to enhance the user experience during file processing.

Prerequisites

Before starting, ensure you have Python installed along with the necessary libraries. Install the required libraries by running:

pip install docx2pdf tqdm

Script Breakdown

Step 1: Import Necessary Libraries

The script begins by importing three essential modules:
  • os: To interact with the file system and retrieve file paths.
  • docx2pdf: For converting .docx files to .pdf.
  • tqdm: For displaying a progress bar during the conversion process.
  • time: To simulate the conversion progress.
import os
from docx2pdf import convert
from tqdm import tqdm
import time  # To simulate progress updates

Step 2: Define Helper Function to List .docx Files

This function, list_docx_files, takes a folder path as input and returns a list of all .docx files in that folder. This is a simple utility to help identify which files need to be converted.

def list_docx_files(folder):
"""List all .docx files in the given folder."""
return [f for f in os.listdir(folder) if f.endswith('.docx')] 

Step 3: Ask for Input and Validate Folder Existence

The convert_docx_to_pdf_dynamic() function starts by asking the user to input the source folder path where .docx files are located. It checks whether the folder exists, and if it doesn’t, the program terminates with an error message.

def convert_docx_to_pdf_dynamic():
    try:
        # Step 1: Ask for the folder containing Word files
        source_folder = input("Enter the folder path containing .docx files: ").strip()
        if not os.path.exists(source_folder):
            print(f"The folder '{source_folder}' does not exist.")
            return

Step 4: List and Confirm .docx Files for Conversion

The script proceeds to list all .docx files in the source folder. If no .docx files are found, it terminates early. The user is prompted to confirm whether they want to proceed with the conversion process.

        # Step 2: List all .docx files in the source folder
        docx_files = list_docx_files(source_folder)
        if not docx_files:
            print(f"No .docx files found in the folder '{source_folder}'.")
            return
        
        print("The following .docx files were found:")
        for i, docx_file in enumerate(docx_files, start=1):
            print(f"{i}. {docx_file}")
        
        # Step 3: Ask for confirmation to proceed
        proceed = input("Do you want to proceed with converting these files to PDF? (yes/no): ").strip().lower()
        if proceed != 'yes':
            print("Conversion aborted.")
            return

Step 5: Define Destination Folder for PDF Files

Next, the script asks the user for the destination folder where the converted .pdf files will be saved. If the folder doesn’t exist, it is created automatically.

	# Step 4: Ask for the destination folder to save PDFs
        output_folder = input("Enter the destination folder to save PDFs: ").strip()
        if not os.path.exists(output_folder):
            os.makedirs(output_folder)  # Create the folder if it doesn't exist
            print(f"Created the destination folder: {output_folder}")

Step 6: Convert Each .docx File with Progress Indicator

For each .docx file, the script displays the file being converted and a progress bar using tqdm. Although the docx2pdf library doesn’t offer real-time progress, we simulate the progress with a for loop and a small delay using time.sleep(). Once the file is converted, the progress bar reaches 100%.

	# Step 5: Convert each .docx file to PDF with individual progress indication
        for docx_file in docx_files:
            try:
                docx_path = os.path.join(source_folder, docx_file)
                print(f"Converting '{docx_file}' to PDF...")
                
                # Simulate progress for each file
                with tqdm(total=100, desc=f"Processing '{docx_file}'", unit="%", leave=True) as pbar:
                    # Actual conversion happens here
                    convert(docx_path, output_folder)
                    # Simulate progress
                    for _ in range(100):
                        time.sleep(0.05)  # Simulate work by sleeping
                        pbar.update(1)
                    
                print(f"Successfully converted '{docx_file}' to PDF.")
            except Exception as e:
                print(f"Error converting '{docx_file}': {e}")
        
        print("All conversions completed!")
    
    except Exception as e:
        print(f"An error occurred during the process: {e}")

Step 7: Run the Script

Finally, the script is executed by calling the convert_docx_to_pdf_dynamic() function. This initiates the entire process from taking user inputs to converting and saving the PDFs with visual progress feedback.

# Run the dynamic converter
convert_docx_to_pdf_dynamic()

Script  Full

import os
from docx2pdf import convert
from tqdm import tqdm
import time  # To simulate progress updates

def list_docx_files(folder):
    """List all .docx files in the given folder."""
    return [f for f in os.listdir(folder) if f.endswith('.docx')]

def convert_docx_to_pdf_dynamic():
    try:
        # Step 1: Ask for the folder containing Word files
        source_folder = input("Enter the folder path containing .docx files: ").strip()
        if not os.path.exists(source_folder):
            print(f"The folder '{source_folder}' does not exist.")
            return
        
        # Step 2: List all .docx files in the source folder
        docx_files = list_docx_files(source_folder)
        if not docx_files:
            print(f"No .docx files found in the folder '{source_folder}'.")
            return
        
        print("The following .docx files were found:")
        for i, docx_file in enumerate(docx_files, start=1):
            print(f"{i}. {docx_file}")
        
        # Step 3: Ask for confirmation to proceed
        proceed = input("Do you want to proceed with converting these files to PDF? (yes/no): ").strip().lower()
        if proceed != 'yes':
            print("Conversion aborted.")
            return
        
        # Step 4: Ask for the destination folder to save PDFs
        output_folder = input("Enter the destination folder to save PDFs: ").strip()
        if not os.path.exists(output_folder):
            os.makedirs(output_folder)  # Create the folder if it doesn't exist
            print(f"Created the destination folder: {output_folder}")
        
        # Step 5: Convert each .docx file to PDF with individual progress indication
        for docx_file in docx_files:
            try:
                docx_path = os.path.join(source_folder, docx_file)
                print(f"Converting '{docx_file}' to PDF...")
                
                # Simulate progress for each file
                with tqdm(total=100, desc=f"Processing '{docx_file}'", unit="%", leave=True) as pbar:
                    # Here you can simulate the progress update
                    # Since we don't have internal progress, this is just for display
                    # Actual conversion happens here
                    convert(docx_path, output_folder)
                    # Simulate progress
                    for _ in range(100):
                        time.sleep(0.05)  # Simulate work by sleeping
                        pbar.update(1)
                    
                print(f"Successfully converted '{docx_file}' to PDF.")
            except Exception as e:
                print(f"Error converting '{docx_file}': {e}")
        
        print("All conversions completed!")
    
    except Exception as e:
        print(f"An error occurred during the process: {e}")

# Run the dynamic converter
convert_docx_to_pdf_dynamic()

Conclusion

This Python script provides a user-friendly solution for converting .docx files to .pdf, offering visual feedback for each file processed. The inclusion of a progress bar makes it easier for users to see the progress of each file being converted. Although the conversion process is relatively quick, the simulated progress ensures that users are kept informed throughout the conversion process.


Rendi Julianto

Experienced programming developer with a passion for creating efficient, scalable solutions. Proficient in Python, JavaScript, and PHP, with expertise in web development, API integration, and software optimization. Adept at problem-solving and committed to delivering high-quality, user-centric applications.

Posting Komentar (0)
Lebih baru Lebih lama