In this tutorial, we will build a simple audiobook player using Python and CustomTkinter.
You will learn how to convert text into speech using gTTS and play it with PyGame.
We will also implement play, pause, and stop controls, like those in a real audio player.
By the end, you will have a clean and functional desktop audiobook app.
Prerequisite:
This tutorial is part of the standalone tutorial.
📚 View the standalone tutorial
This tutorial is part of the standalone tutorial.
📚 View the standalone tutorial
Preliminary
Before I begin, it is recommended to activate the virtual environment before installing the relevant dependencies.
python -m venv venv
venv\Scripts\activate
pip install customtkinter pillow gTTS pygame CTkMessagebox pypdf
Then, the following steps include setting up the file structure, app.py, and two additional folders: the uploads and media folders. The media folder contains the icons necessary to build the app; there are read, pause, and stop icons, as shown on the diagram.I have 4 sections here: the header, upload area, text body, and button group.
(a) The header—this section is basically a label with an emoji.
(b) The upload area—it has a frame, and within the frame is a label of the description and an upload button.
(c) The text body—it is a textbox with a scrollbar on the right side.
(d) The button group—an inline of 3 buttons; they are the read, pause, and stop buttons with related icons. Initially, all button statuses were set as disabled, as there is no audiobook available.
Below is the code:
from tkinter import *
import customtkinter as ctk
from PIL import Image
from customtkinter import CTkImage
# The header
header = ctk.CTkLabel(root, text="Welcome to My Audiobook App 📖",
font=('Roboto', 24),
text_color="#1a1a1a")
header.pack(pady=20)
# The upload area
my_frame = ctk.CTkFrame(root, height=300, width=500, corner_radius=15,
fg_color="#D2C8C8",
border_width=2, border_color="#cccccc")
my_frame.pack(pady=30)
my_label = ctk.CTkLabel(my_frame,
text="😄 Upload your preferred pdf file!",
width=500, height=40,
font=('Roboto', 16))
my_label.pack(pady=20)
my_button = ctk.CTkButton(my_frame,
text="Submit your book",
width=150,
height=40,
command=submit)
my_button.pack(pady=20)
# The text body
book_label = ctk.CTkLabel(root, text="Book content:", font=('Roboto', 16),
text_color="#1a1a1a")
book_label.pack(anchor=W, padx=50)
my_content = ctk.CTkTextbox(root, width=500, height=200,
font=('Roboto', 14), activate_scrollbars=True,
border_width=2, border_color="#cccccc")
my_content.pack(pady=10)
# The button group - all icons were located in the media folder
button_frame = ctk.CTkFrame(root, height=50, width=500, corner_radius=10)
button_frame.pack(pady=20)
book_image = CTkImage(
light_image=Image.open("media/book.png"),
size=(30, 30)
)
pause_image = CTkImage(
light_image=Image.open("media/pause.png"),
size=(30, 30)
)
stop_image = CTkImage(
light_image=Image.open("media/stop.png"),
size=(30, 30)
)
read_button = ctk.CTkButton(button_frame,
text="Read",
image=book_image,
compound='left',
height=50,
width=100,
state="disabled",
command=read)
read_button.grid(row=0, column=0, padx=40, pady=10)
pause_button = ctk.CTkButton(button_frame,
text="Pause",
image=pause_image,
compound='left',
height=50,
width=100,
state="disabled",
command=pause)
pause_button.grid(row=0, column=1, padx=40, pady=10)
stop_button = ctk.CTkButton(button_frame,
text="Stop",
image=stop_image,
compound='left',
height=50,
width=100,
state="disabled",
command=stop)
stop_button.grid(row=0, column=2, padx=40, pady=10)
When the user clicks the upload button, the app opens a file dialogue box and prompts the user to select a PDF file to upload.
Once the user selects their preferred PDF file, it will extract the text using PdfReader and insert it into the textbox. It will facilitate the user viewing the text when he/she is listening to the audio.
The upload button will be set to disabled (to avoid the user wrongfully clicking), while both the read and stop buttons will be set to normal. If the user needs to change to another PDF, he/she needs to click the stop button to set the upload button to normal for the next PDF upload. My label will change to "Your book is ready to read" when the PDF file is uploaded.
from tkinter import filedialog
import os
import shutil
from CTkMessagebox import CTkMessagebox
from pypdf import PdfReader
UPLOAD_FOLDER = "uploads"
def upload():
# Declare global variables so they can be used outside this function
global file_path, text
# Open file dialog to select a PDF file
file_path = filedialog.askopenfilename(title="Select a file",
filetypes=[("PDF Files", "*.pdf")])
# If user selected a file
if file_path:
# Get just the filename (without full path)
filename = os.path.basename(file_path)
# Create destination path inside UPLOAD_FOLDER
dest = os.path.join(UPLOAD_FOLDER, filename)
# Copy the selected file into the uploads folder
shutil.copy(file_path, dest)
# Show success message
CTkMessagebox(title="Info", message="File uploaded successfully!")
# If file exists, start reading the PDF
if file_path:
# Create PDF reader object
reader = PdfReader(file_path)
# Initialize empty string to store extracted text
text = ""
# Loop through all pages in the PDF
for page in reader.pages:
# Extract text from current page
page_text = page.extract_text()
# If text exists on that page, add it to full text
if page_text:
text += page_text + "\n"
# Clear the textbox before inserting new content
my_content.delete("1.0", "end")
# Insert extracted PDF text into textbox
my_content.insert("1.0", text)
# Update label to inform user the book is ready
my_label.configure(text="✅ Your book is ready to read!")
# Disable upload button after file is loaded
my_button.configure(state="disabled")
# Enable read and stop buttons
read_button.configure(state="normal")
stop_button.configure(state="normal")Once the user clicks the read button, the uploaded PDF will be generated into an MP3 file and saved as "audiobook.mp3." If the audio was previously paused, Pygame will now resume it from that point; otherwise, if the audio is being played for the first time, Pygame will start from the beginning.
When it is in reading mode, both the pause and stop buttons are set as normal, while the read button is disabled.
from gtts import gTTS
import pygame
import os
is_paused = False
is_playing = False
audio_file = "audiobook.mp3"
def read():
# Use global variables so we can control playback state
global is_paused, is_playing
# Disable the Read button (to prevent double clicking)
read_button.configure(state="disabled")
# Enable Pause and Stop buttons
pause_button.configure(state="normal")
stop_button.configure(state="normal")
# If audio was paused before → just resume instead of recreating audio
if is_paused:
# Resume paused audio playback
pygame.mixer.music.unpause()
# Update pause state
is_paused = False
# Exit function
return
# Create Text-to-Speech object using gTTS
tts = gTTS(text=text, lang='en', slow=False)
# If audio file already exists → delete it first
if os.path.exists(audio_file):
os.remove(audio_file)
# Save the generated speech into an MP3 file
tts.save(audio_file)
# Initialize pygame mixer (audio engine)
pygame.mixer.init()
# Load the saved audio file
pygame.mixer.music.load(audio_file)
# Start playing the audio
pygame.mixer.music.play()
# Update play state
is_playing = TruePygame will pause the audio if the user needs to. Similar to the pause mode, the read function is now reversed, with the read button set to normal and the pause button set to disabled. Everything else remains unchanged.
import pygame
is_paused = False
def pause():
# Use global variable so we can change pause state
global is_paused
# Enable Read button (so user can resume playback)
read_button.configure(state="normal")
# Disable Pause button (already paused, no need to press again)
pause_button.configure(state="disabled")
# Keep Stop button enabled
stop_button.configure(state="normal")
# Pause the currently playing audio
pygame.mixer.music.pause()
# Update pause state to True
is_paused = TruePygame will stop before regenerating audio if the user presses the Stop button. Both the content in the textbox and the audiobook.pdf will be removed. My text will also revert to "Upload your preferred PDF file!" and all buttons will be blocked again, with the exception of the upload button.
import pygame
from CTkMessagebox import CTkMessagebox
import os
is_paused = False
is_playing = False
def stop():
# Use global variables so we can reset playback states
global is_paused, is_playing
# Disable all audio control buttons
read_button.configure(state="disabled")
pause_button.configure(state="disabled")
stop_button.configure(state="disabled")
# Enable upload button again
my_button.configure(state="normal")
# Reset label message
my_label.configure(text="😄 Upload your preferred pdf file!")
# Clear the text box content
my_content.delete("1.0", "end")
# Stop audio playback immediately
pygame.mixer.music.stop()
# Unload the audio file from memory
pygame.mixer.music.unload()
# Close pygame mixer completely
pygame.mixer.quit()
# Reset playback state flags
is_paused = False
is_playing = False
# Set file path to the uploaded PDF inside uploads folder
file_path = os.path.join("uploads", "audiobook.pdf")
# If the file exists → delete it
if os.path.exists(file_path):
os.remove(file_path)
CTkMessagebox(title="Info", message="File deleted successfully!")
else:
CTkMessagebox(title="Info", message="File does not exist!")
Here is my sample PDF"Preliminary.
Before I begin, it is recommended to activate the virtual environment before installing the relevant
dependencies. I have created a blog post for MapView, and the link https://python
post.blogspot.com/2026/01/mapview.html. Here is the Reddit who have problem with the
MapView. How can I share my assistance with the Blog post I create for the topic of MapView!"
Click play to listen to the audio.
Final wrap-up
You’ve just built more than a simple app—you’ve created a bridge between text and voice. With CustomTkinter, gTTS, and pygame, your audiobook player brings written words to life in just a few lines of Python. From uploading a PDF to controlling playback like a real media app, you now understand the full workflow behind it.
Published: Feb 2026
Last Updated: Feb 2026
About the Author
Kelvin Loh is a Python developer focused on Flask, desktop applications, and business automation solutions. He shares practical tutorials and real-world coding projects to help developers and small businesses build useful applications.






Comments
Post a Comment