An Introduction to the Art of Computer Programming Using Python in the Age of Generative AI

XIX. Testing and Debugging

Introduction

Computer scientist Edsger Dijkstra, a pioneer of structured programming, wrote in 1970: "Program testing can be used to show the presence of bugs, but never to show their absence!" Testing is essential because it helps confirm that your code behaves correctly under a variety of circumstances, while significantly reducing the likelihood of undiscovered defects. Although no amount of testing can prove that your program is 100% bug-free, maintaining a solid testing strategy is a cornerstone of professional software development.

Black-Box Testing

Black-box testing involves validating a piece of software solely by examining its inputs and outputs, without direct knowledge of its internal code or structure. This approach focuses on functional requirements, ensuring the software produces correct results for a range of test cases and edge conditions. Because the tester doesn’t look inside the code, black-box testing is excellent for validating user-facing behavior and verifying compliance with specifications.

White-Box Testing

By contrast, white-box testing (sometimes called clear-box or glass-box testing) involves examining the software’s internal logic and code paths. Testers or developers use knowledge of the implementation details—such as specific functions, branches, and loops—to create tests that ensure each pathway in the code is exercised. White-box and black-box approaches often complement each other to provide broader coverage and assurance of software quality.

Using Debuggers

Debugging is the process of locating, diagnosing, and fixing errors within your code. Python’s built-in debugger, pdb, is a powerful tool for stepping through your program line by line, inspecting variables, and testing hypotheses about where the bug might be.

Modern Tip: In Python 3.7+, you can use the built-in function breakpoint(). It works exactly like pdb.set_trace() but is cleaner and can be disabled globally via environment variables.


import pdb

def add_numbers(a, b):
    # Execution will pause here, opening the interactive debugger
    pdb.set_trace()
    # Or simply: breakpoint()
    return a + b

result = add_numbers(1, 2)
print(f"Result: {result}")
        

When the program stops, you can use commands in the console:

Writing Unit Tests

Unit tests validate the functionality of small, isolated parts (or “units”) of code. Python’s built-in unittest module offers test case classes and assertion methods.


import unittest

def multiply(a, b):
    return a * b

class TestMultiplication(unittest.TestCase):
    def test_multiply(self):
        # Happy path
        self.assertEqual(multiply(2, 3), 6)
        # Edge cases
        self.assertEqual(multiply(-1, 3), -3)
        self.assertEqual(multiply(0, 3), 0)

if __name__ == '__main__':
    # 'exit=False' prevents the test runner from closing the environment
    unittest.main(argv=['first-arg-is-ignored'], exit=False)
        

Modern Testing with Pytest

While unittest is built-in, the standard in the Data Science and AI industry is pytest. It is less verbose and uses standard assert statements.


# pytest style (simpler and cleaner)
def test_multiply_pytest():
    assert multiply(2, 3) == 6
    assert multiply(-1, 3) == -3
    assert multiply(0, 3) == 0
        

Mocking: Testing AI without APIs

When building AI applications, you cannot call the real API (like OpenAI) every time you run tests—it is slow, expensive, and non-deterministic. Instead, you use Mocking to simulate the API response.


from unittest.mock import MagicMock

# Simulate an AI client (like openai.OpenAI())
mock_ai_client = MagicMock()

# Define what the "fake" API should return when called
# This mimics the structure: client.chat.completions.create()
mock_ai_client.chat.completions.create.return_value = "Mocked AI Response"

# Run your code using the mock instead of the real client
# This verifies your logic without spending money or requiring internet
result = mock_ai_client.chat.completions.create(model="gpt-4", messages=[])

print(f"Result from mock: {result}")
        

Test-Driven Development (TDD)

Test-Driven Development (TDD) is a process where you write unit tests before writing the code that satisfies those tests. It commonly follows the "Red-Green-Refactor" cycle:

Generative AI Insight: The Challenge of "Evals"
Traditional testing relies on exact matches (assert result == 10). However, Generative AI outputs are non-deterministic text. You cannot simply assert that a summarized essay "is correct."

In AI Engineering, testing is called Evaluation (Evals). This often involves "LLM-as-a-Judge," where you use a powerful model (like GPT-4) to grade the output of your application based on criteria like tone, accuracy, and safety.

Prompting AI for Testing

Generative AI is excellent at writing boilerplate test code. You can paste a function into an LLM and ask it to generate comprehensive unit tests, including edge cases you might have missed.

Example Prompt:
I have a Python function `calculate_discount(price, rate)`. Write a `unittest` class for it. Include edge cases for negative prices, 0% rate, and 100% rate.

Resulting AI-generated code:


import unittest

def calculate_discount(price, rate):
    if price < 0 or rate < 0:
        raise ValueError("Inputs must be non-negative")
    return price * (1 - rate)

class TestDiscount(unittest.TestCase):
    def test_standard(self):
        self.assertEqual(calculate_discount(100, 0.2), 80.0)

    def test_full_discount(self):
        self.assertEqual(calculate_discount(100, 1.0), 0.0)

    def test_no_discount(self):
        self.assertEqual(calculate_discount(100, 0.0), 100.0)

    def test_negative_input(self):
        with self.assertRaises(ValueError):
            calculate_discount(-50, 0.1)

if __name__ == '__main__':
    unittest.main(argv=['first-arg-is-ignored'], exit=False)
        
Back to Home