An Introduction to the Art of Computer Programming Using Python in the Age of Generative AI

XI. File Handling and I/O

Introduction to File Handling

File handling is fundamental in most applications. It enables programs to read from and write to files on disk, preserving data between runs. Python provides a flexible set of tools for file operations, simplifying the storage of user inputs, configuration settings, logs, or any form of persistent data. Proper file handling is crucial for safe data exchange and preventing issues like data corruption, file lock conflicts, or memory overuse.

Reading and Writing Text Files

You can open text files using Python’s built-in open function. The mode parameter ('r' for reading, 'w' for writing, 'a' for appending, etc.) specifies how you interact with the file.

Crucial for AI: Always specify encoding='utf-8'. AI models work with data from around the world (emojis, multiple languages). If you rely on the system default encoding (which might be ASCII or CP1252 on Windows), your program may crash when it encounters a character like "🤖" or "é".

Using a with statement (Context Manager) is best practice because it automatically closes the file, even if an error occurs, minimizing resource leaks.


# Writing to a file (Explicitly using UTF-8)
with open('example.txt', 'w', encoding='utf-8') as file:
    file.write('Hello, World! 🌍')

# Reading from a file
with open('example.txt', 'r', encoding='utf-8') as file:
    content = file.read()
    print(f"Content: {content}")
        

Sometimes you only need to read part of a file or process it line by line. In such cases, you can use file.readlines() or iterate over the file object directly:


# Reading a file line by line (Memory efficient)
with open('example.txt', 'r', encoding='utf-8') as file:
    for line in file:
        print(f"Line: {line.strip()}")
        

Reading and Writing Binary Files

Not all data is text-based. Binary files (images, audio, executables, or serialized AI models like .pkl or .pt files) require opening in binary mode ('b'). This prevents Python from interpreting the data as text characters, ensuring it is read or written exactly as stored on disk (raw bytes).


# Writing binary data (bytes) to a file
with open('example.bin', 'wb') as file:
    file.write(b'\x00\x01\x02\x03')

# Reading binary data
with open('example.bin', 'rb') as file:
    data = file.read()
    print(f"Raw Bytes: {data}")
        
Generative AI Insight: Model Serialization
When you save a trained AI model (like a neural network), you are essentially writing a complex binary file. Libraries like PyTorch or TensorFlow serialize the model's weights (millions of floating-point numbers) into a binary format (e.g., .pt or .h5 files) so they can be loaded later without retraining.

Working with JSON Data

In the age of Generative AI, JSON (JavaScript Object Notation) is the standard format for exchanging data. Whether you are saving chat history, configuring model hyperparameters, or calling an API, you are likely using JSON. Python's built-in json library makes this easy.


import json

data = {
    "model": "gpt-4",
    "temperature": 0.7,
    "messages": [{"role": "user", "content": "Hello!"}]
}

# Saving JSON to a file (Serialization)
with open('config.json', 'w', encoding='utf-8') as f:
    json.dump(data, f, indent=4)

# Loading JSON from a file (Deserialization)
with open('config.json', 'r', encoding='utf-8') as f:
    loaded_data = json.load(f)
    print(f"Loaded Model: {loaded_data['model']}")
        

Working with File Paths

The os.path and pathlib modules let you manipulate file paths in an OS-agnostic way, ensuring your code runs consistently across Windows, macOS, and Linux, where path structures differ (backslashes vs. forward slashes).

Note: pathlib is the modern standard in Python 3. It treats paths as objects rather than strings, which is less error-prone.


import os
from pathlib import Path

# Legacy: Using os.path
path_os = os.path.join('folder', 'file.txt')
print(f"os.path: {path_os}")

# Modern: Using pathlib (Preferred)
path_lib = Path('folder') / 'file.txt'
print(f"pathlib: {path_lib}")

# Checking extensions
print(f"Extension: {path_lib.suffix}")
        

Handling I/O Errors

Files can be missing, corrupted, or inaccessible. Python raises exceptions like FileNotFoundError and PermissionError for such situations. Using try-except blocks ensures your program doesn’t crash when a file operation fails.


try:
    with open('nonexistent.txt', 'r', encoding='utf-8') as file:
        content = file.read()
except FileNotFoundError:
    content = 'Error: File not found.'
except PermissionError:
    content = 'Error: Permission denied.'

print(content)
        

Appending to Files

The 'a' mode in the open function appends data to an existing file or creates a new one if it doesn’t exist. This approach is useful for logging events or saving chat history line by line.


# Appending to a file
with open('example.txt', 'a', encoding='utf-8') as file:
    file.write('\nAppend this line.')

# Reading the appended file
with open('example.txt', 'r', encoding='utf-8') as file:
    content = file.read()
    print(content)
        
Generative AI Insight: JSONL for Large Datasets
When training AI models on massive datasets, loading a single huge JSON file into memory is impossible. Instead, AI engineers use JSONL (JSON Lines), where each line in the file is a separate, valid JSON object. This allows the file to be processed line-by-line (streaming) efficiently.

Prompting Generative AI for Effective File Handling and I/O

Generative AI can help you design scripts for your specific I/O needs. By specifying file formats (e.g., CSV, JSON, or binary) and error-handling preferences, you can obtain examples that integrate reading, processing, and writing files seamlessly.

Example Prompt:
Generate a Python script that reads a CSV file, processes the data, and writes the results to a new CSV file, handling any file-related errors gracefully.

Resulting AI-generated code:


import csv

def process_csv(input_file, output_file):
    try:
        # newline='' is required for the csv module to handle line endings correctly
        with open(input_file, 'r', encoding='utf-8', newline='') as infile:
            reader = csv.reader(infile)
            data = [row for row in reader]

        if not data:
            return "Input file is empty."

        # Process data (example: convert all text to uppercase)
        processed_data = [[cell.upper() for cell in row] for row in data]

        with open(output_file, 'w', encoding='utf-8', newline='') as outfile:
            writer = csv.writer(outfile)
            writer.writerows(processed_data)

        return f"Processing complete. Saved to {output_file}"

    except FileNotFoundError:
        return f"Error: The file '{input_file}' was not found."
    except PermissionError:
        return f"Error: Permission denied for '{input_file}'."
    except Exception as e:
        return f"An unexpected error occurred: {str(e)}"

# Note: This will return an error here because input.csv doesn't exist on the server
result = process_csv('input.csv', 'output.csv')
print(result)
        

Whether you require chunk-based reading for huge files, special file encodings, or structured data parsing, a well-crafted AI prompt can yield robust, targeted solutions. Always review and test any AI-generated code, particularly in production or security-sensitive environments.

Back to Home