In this tutorial, we’ll create a Python script that merges multiple PDF files from a folder into a single PDF document, all while ensuring that the files are sorted alphabetically and numerically. By the end of this guide, you'll have a working solution that simplifies the process of merging PDFs in just a few steps.
Key Libraries Used:
- os: For interacting with the file system and retrieving file paths.
- PyPDF2 (PdfMerger): For merging multiple PDFs into one.
- re: For sorting the file names both alphabetically and numerically using regular expressions.
Step 1: Installation of Required Libraries
Before we start coding, we need to install the necessary libraries. We will
use the PyPDF2
library to handle PDF files. Follow these steps to
install the required packages:
1. Install Python (If not already installed)
Make sure Python is installed on your system. You can download it from the official Python website. To check if Python is installed, open a terminal and run:
python --version
2. Install PyPDF2
To handle PDF merging, install the PyPDF2
library using
pip
, Python’s package manager. Open your terminal or command
prompt and run the following command:
pip install PyPDF2
This will install the library necessary to merge PDF files.
Step 2: Import the Necessary Libraries
Once the libraries are installed, the first step is to import the essential
modules that enable file listing, sorting, and merging. We use the
os
module to navigate the file system, re
for
sorting logic, and PdfMerger
from the
PyPDF2
library to handle the PDF merging.
import os from PyPDF2 import PdfMerger import re
Step 3: Define a Function to List PDF Files
We create a function list_pdf_files()
that accepts a folder
path as input, retrieves all .pdf
files, and sorts them both
alphabetically and numerically. The sorting function ensures that files like
file1.pdf
, file2.pdf
, and
file10.pdf
are sorted properly.
def list_pdf_files(folder): """List all .pdf files in the given folder, sorted alphabetically and numerically.""" def alphanum_key(file): """Sort by numbers if present in the file name, otherwise alphabetically.""" return [int(s) if s.isdigit() else s.lower() for s in re.split(r'(\d+)', file)] pdf_files = [f for f in os.listdir(folder) if f.endswith('.pdf')] return sorted(pdf_files, key=alphanum_key)
Step 4: Define the Merge Function
This is the core of the script, which handles the merging process. The function asks for the folder path containing the PDF files, lists and sorts them, confirms whether the user wants to proceed, and finally merges the files into one.
def merge_pdfs(): try: # Step 1: Ask for the folder containing PDF files source_folder = input("Enter the folder path containing PDF files: ").strip() if not os.path.exists(source_folder): print(f"The folder '{source_folder}' does not exist.") return # Step 2: List all PDF files in the source folder, sorted alphabetically pdf_files = list_pdf_files(source_folder) if not pdf_files: print(f"No PDF files found in the folder '{source_folder}'.") return print("The following PDF files were found (sorted alphabetically):") for i, pdf_file in enumerate(pdf_files, start=1): print(f"{i}. {pdf_file}") # Step 3: Ask for confirmation to proceed proceed = input("Do you want to proceed with merging these files into one PDF? (yes/no): ").strip().lower() if proceed != 'yes': print("Merging aborted.") return # Step 4: Merge all PDFs into one file in alphabetical order merger = PdfMerger() for pdf_file in pdf_files: pdf_path = os.path.join(source_folder, pdf_file) print(f"Merging '{pdf_file}'...") merger.append(pdf_path) # Step 5: Save the merged PDF to a new file output_pdf_path = os.path.join(source_folder, "merged_output.pdf") merger.write(output_pdf_path) merger.close() print(f"All files merged into: {output_pdf_path}") except Exception as e: print(f"An error occurred during the process: {e}")
Step 5: Run the Script
The final part of the script is the execution. We simply call the
merge_pdfs()
function, which will prompt the user for input and
handle the entire merging process.
# Run the PDF merger merge_pdfs()
How It Works:
- The user provides the folder path containing the PDF files.
- The script lists all the
.pdf
files, sorts them alphabetically and numerically. - The user is asked whether they want to proceed with merging the files.
- The script merges all the PDFs in the folder and saves the final file as
merged_output.pdf
in the same directory.
Key Features:
-
Alphabetical and Numerical Sorting: The script
intelligently sorts files by name and number, ensuring that files with
numeric suffixes like
report1.pdf
andreport2.pdf
appear in the correct order - User Interaction: The user is asked to confirm the merging operation, which prevents accidental merges.
- Error Handling: If something goes wrong (e.g., an invalid file path), the script provides a helpful error message.
Conclusion
With this simple Python script, you can easily merge multiple PDF files into one without the hassle of manually arranging them. Whether you’re working with reports, research papers, or any other type of document, this tool offers a quick and efficient way to handle your PDF management needs. By following the steps outlined in this guide, you’ll be able to create your own PDF merger in Python. The script is flexible and can be easily modified or extended for additional functionalities, such as filtering specific PDFs or adding metadata to the final merged document.
Happy coding!