Table of contents
- Additional File Handling Techniques
- Pickle: Serialization and Deserialization
- Seek and Tell: Navigating Within Files
- Random Accessing Binary Files using mmap
- Zipping and Unzipping with zipfile
- Working with Directories
- Running Other Programs from a Python Program
- Handling Large Files with Chunking
- Memory-Efficient File Processing
- Working with Network Resources
- Parallel Processing of Files
- Monitoring File System Changes
In the first part of this blog series, we explored the basics of working with files in Python, understanding various file types, and opening, reading, and writing to files. Now, we'll delve into more advanced file handling techniques and libraries that can take your Python file manipulation skills to the next level.
Additional File Handling Techniques
Pickle: Serialization and Deserialization
The Python pickle
module offers a robust solution for serializing and deserializing Python objects. Serialization is the process of converting a Python object into a format that can be easily stored or transmitted. Deserialization is the reverse process, where serialized data is converted back into Python objects. Pickle is particularly useful when you need to save and load complex data structures, making it an excellent choice for preserving program states or sharing data between Python programs.
Example:
import pickle
# Serialize an object to a file
data = {'name': 'Sanjay Singh', 'age': 33, 'city': 'Lucknow'}
with open('data.pkl', 'wb') as file:
pickle.dump(data, file)
# Deserialize the object from the file
with open('data.pkl', 'rb') as file:
loaded_data = pickle.load(file)
print(loaded_data)
Seek and Tell: Navigating Within Files
When working with files, you may need to move to specific locations within the file, especially in the case of large files or when you want to perform random access. The seek()
method allows you to move to a specific position in the file, while tell()
returns the current position.
Example:
with open('large_file.txt', 'r') as file:
file.seek(10) # Move to the 11th character
data = file.read(5) # Read the next 5 characters
position = file.tell() # Get the current position
print(data, position)
Random Accessing Binary Files using
mmap
The mmap
module provides a memory-mapping mechanism that allows you to map a file into memory, making it accessible as if it were an array. This is especially useful for large binary files, such as databases, where you can read, modify, and write specific parts of the file without reading or writing the entire file.
Example:
import mmap
with open('large_data.bin', 'r+b') as file:
mmapped_file = mmap.mmap(file.fileno(), 0)
mmapped_file[10:20] = b'Updated Data' # Modify data at a specific location
mmapped_file.close() # Close the mmap when done
Zipping and Unzipping with
zipfile
Python's zipfile
module allows you to create, extract, and manipulate ZIP archives. You can use this library to compress and decompress files and directories, which is valuable for reducing file sizes and organizing data efficiently.
Example:
import zipfile
# Creating a ZIP archive
with zipfile.ZipFile('archive.zip', 'w') as zipf:
zipf.write('file1.txt')
zipf.write('file2.txt')
# Extracting files from a ZIP archive
with zipfile.ZipFile('archive.zip', 'r') as zipf:
zipf.extractall('extracted_files')
Working with Directories
Python's os
module provides a variety of functions for working with directories. You can create, remove, list, and navigate directories, which is essential for managing and organizing files and directories within your programs.
Example:
import os
# Create a new directory
os.mkdir('new_directory')
# List the files in a directory
files = os.listdir('directory_path')
# Navigate to a different directory
os.chdir('path_to_directory')
# Remove a directory and its contents
os.rmdir('directory_to_remove')
Running Other Programs from a Python Program
Python allows you to execute external programs or scripts from within your Python code using the subprocess
module. This capability is incredibly useful for automating tasks that involve other command-line tools or running separate scripts as part of your workflow.
Example:
import subprocess
# Run an external command
result = subprocess.run('ls -l', shell=True, text=True, stdout=subprocess.PIPE)
print(result.stdout)
Handling Large Files with Chunking
When working with exceptionally large files that may not fit entirely into memory, it's essential to process them in smaller chunks. You can read and process data in manageable portions, making it feasible to work with files of any size.
Example:
chunk_size = 1024 # 1 KB
with open('large_file.txt', 'r') as file:
while True:
chunk = file.read(chunk_size)
if not chunk:
break
# Process the chunk of data
Memory-Efficient File Processing
For memory-efficient processing of large files, you can utilize techniques like generators and iterators. These allow you to process data sequentially, reducing memory consumption.
Example:
def process_large_file(file_path):
with open(file_path, 'r') as file:
for line in file:
# Process one line at a time
yield process_line(line)
for result in process_large_file('large_data.txt'):
# Process the result
Working with Network Resources
Python can also interact with files from network resources, such as files hosted on web servers or FTP sites. Libraries like requests
or ftplib
allows you to download and upload files from remote servers.
Examples:
import requests
url = 'https://bytescrum.com/somefile.txt'
response = requests.get(url)
if response.status_code == 200:
with open('downloaded_file.txt', 'wb') as file:
file.write(response.content)
Parallel Processing of Files
When dealing with a large number of files or processing tasks, Python's concurrent.futures
or multiprocessing
modules can be used to process files in parallel, improving efficiency.
Examples:
import concurrent.futures
file_list = ['file1.txt', 'file2.txt', 'file3.txt']
def process_file(file_name):
# Process each file individually
pass
with concurrent.futures.ThreadPoolExecutor() as executor:
executor.map(process_file, file_list)
Monitoring File System Changes
In certain scenarios, you may need to monitor files and directories for changes, such as new files, modifications, or deletions. The watchdog
library is a popular choice for real-time file system monitoring in Python.
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
class MyHandler(FileSystemEventHandler):
def on_modified(self, event):
if event.is_directory:
return
print(f'File modified: {event.src_path}')
observer = Observer()
observer.schedule(MyHandler(), path='path_to_watch')
observer.start()
Summary
As you continue your Python journey, remember that effective file handling opens doors to countless possibilities in software development. So, keep coding, keep exploring, and keep building amazing things with Python!
Happy coding! 🐍📂✨