An Introduction to the Art of Computer Programming Using Python in the Age of Generative AI

XVIII. Data Structures

Introduction to Data Structures

Data structures are fundamental computer science concepts for storing, organizing, and managing data efficiently. They directly impact how quickly and easily you can access, process, and modify your data. Choosing the right data structure for a given problem is crucial, as it can significantly affect both performance (time complexity) and memory usage (space complexity).

We introduced the hashed table (Dictionary) in Chapter VI. Another core type is the array, a fixed-size sequence of elements stored contiguously in memory. In Python, lists function as dynamic arrays. Beyond these, a wide range of data structures exist: linked lists, trees, stacks, queues, and graphs. Each structure has distinct strengths. For example, while arrays allow fast access ($O(1)$), linked lists allow fast insertion ($O(1)$) if you are at the correct position.

Linked Lists

A linked list is a linear collection of nodes, where each node contains two parts: the data value and a reference (pointer) to the next node. Unlike arrays, linked lists do not require contiguous memory, allowing for efficient dynamic sizing.


from typing import Any, Optional

class Node:
    def __init__(self, value: Any):
        self.value = value
        self.next: Optional['Node'] = None

class LinkedList:
    def __init__(self):
        self.head: Optional[Node] = None

    @staticmethod
    def create_node(value: Any) -> Node:
        return Node(value)

    def append(self, value: Any):
        new_node = self.create_node(value)
        if not self.head:
            self.head = new_node
        else:
            current = self.head
            while current.next:
                current = current.next
            current.next = new_node

    def prepend(self, value: Any):
        new_node = self.create_node(value)
        new_node.next = self.head
        self.head = new_node

    def delete(self, value: Any):
        if not self.head:
            return
        if self.head.value == value:
            self.head = self.head.next
            return

        current = self.head
        while current.next and current.next.value != value:
            current = current.next

        if current.next:
            current.next = current.next.next

    def __str__(self) -> str:
        values = []
        current = self.head
        while current:
            values.append(str(current.value))
            current = current.next
        return ' -> '.join(values) + " -> None"
        

linked_list = LinkedList()
linked_list.append(42)
linked_list.append("Mango")
linked_list.prepend(3.14)
print(f"List: {linked_list}")

linked_list.delete(3.14)
print(f"After deletion: {linked_list}")
        

Arrays

An array is a collection of elements in contiguous memory locations, providing constant-time indexing ($O(1)$). In Python, the standard list is dynamic and flexible. However, for numerical efficiency, Python provides the array module (for basic types) and the third-party NumPy library (the standard for AI).


import array

# 'i' type code = signed integer
arr = array.array('i', [1, 2, 3, 4, 5])

# Access elements (O(1))
print(f"Element at index 0: {arr[0]}")

# Modify elements
arr[1] = 10

# Append elements
arr.append(6)

print(f"Array: {arr}")
        
Generative AI Insight: Arrays are Tensors
In the world of AI, multi-dimensional arrays are called Tensors. Everything you feed into a Generative AI model (text, images, audio) is converted into numerical tensors. Libraries like PyTorch and TensorFlow are essentially highly optimized array manipulation engines that run on GPUs.

Stacks

A stack is a LIFO (Last In, First Out) structure. Think of a stack of plates: you add to the top and take from the top. In AI, stacks are used in Depth-First Search (DFS) algorithms.


from collections import deque

class Stack:
    def __init__(self):
        self.items = deque()

    def push(self, item):
        self.items.append(item)

    def pop(self):
        if self.items:
            return self.items.pop()
        return None

    def __str__(self):
        return f"Stack: {list(self.items)}"

# Example usage
stack = Stack()
stack.push(1)
stack.push(2)
print(stack)
print(f"Popped: {stack.pop()}")
        

Queues

A queue is a FIFO (First In, First Out) structure. Elements are "enqueued" at the back and "dequeued" from the front. Queues are essential for Breadth-First Search (BFS) and managing asynchronous tasks.

Note: Using a Python list as a queue is inefficient because removing from the front (index 0) requires shifting all other elements ($O(N)$). The correct tool is collections.deque, which is $O(1)$ for both ends.


from collections import deque

class Queue:
    def __init__(self):
        self.items = deque()

    def enqueue(self, item):
        self.items.append(item)

    def dequeue(self):
        if self.items:
            return self.items.popleft() # O(1) operation
        return None

    def __str__(self):
        return f"Queue: {list(self.items)}"

# Example usage
queue = Queue()
queue.enqueue('A')
queue.enqueue('B')
print(queue)
print(f"Dequeued: {queue.dequeue()}")
        
Generative AI Insight: Data Loading
Queues are the engine behind efficient AI training. While a GPU processes one batch of data, the CPU uses a Queue to pre-load and pre-process the next batch of images or text, ensuring the expensive GPU never sits idle.

Trees

A tree is a hierarchical structure. A common form is the Binary Search Tree (BST), where the left child is smaller than the parent and the right child is larger. Trees are the foundation of file systems, databases, and AI decision-making.

Generative AI Insight: Beam Search & Vector DBs
When an LLM generates text, it explores multiple potential future sentences simultaneously, creating a tree of probabilities (Beam Search). Additionally, Vector Databases (like Pinecone) use tree-based structures to store and retrieve high-dimensional AI embeddings efficiently.

class TreeNode:
    def __init__(self, value: int):
        self.value = value
        self.left: Optional[TreeNode] = None
        self.right: Optional[TreeNode] = None

class BinarySearchTree:
    def __init__(self):
        self.root: Optional[TreeNode] = None

    def insert(self, value: int):
        if not self.root:
            self.root = TreeNode(value)
        else:
            self._insert_recursive(self.root, value)

    def _insert_recursive(self, current: TreeNode, value: int):
        if value < current.value:
            if current.left:
                self._insert_recursive(current.left, value)
            else:
                current.left = TreeNode(value)
        else:
            if current.right:
                self._insert_recursive(current.right, value)
            else:
                current.right = TreeNode(value)

    def inorder_traversal(self) -> list[int]:
        # Returns sorted list for a BST
        elements = []
        self._inorder_recursive(self.root, elements)
        return elements

    def _inorder_recursive(self, current, elements):
        if current:
            self._inorder_recursive(current.left, elements)
            elements.append(current.value)
            self._inorder_recursive(current.right, elements)

# Example usage
tree = BinarySearchTree()
for num in [10, 5, 15, 3, 7]:
    tree.insert(num)

print(f"Sorted values: {tree.inorder_traversal()}")
        

Graphs

A graph consists of nodes (vertices) and edges that connect them. Unlike trees, graphs have no root and can contain cycles. They are used to model networks, such as social connections, maps, or the internet.

Generative AI Insight: Computational Graphs
Deep Learning frameworks like TensorFlow and PyTorch view your code as a "Computational Graph." Every variable is a node, and every operation (add, multiply) is an edge. When you train an AI, the framework traverses this graph backward (Backpropagation) to update the model.

# Simple Graph using Adjacency List
class Graph:
    def __init__(self):
        self.adj_list = {}

    def add_edge(self, u, v):
        if u not in self.adj_list: self.adj_list[u] = []
        if v not in self.adj_list: self.adj_list[v] = []
        # Undirected connection
        self.adj_list[u].append(v)
        self.adj_list[v].append(u)

    def display(self):
        for node, neighbors in self.adj_list.items():
            print(f"{node} -> {neighbors}")

g = Graph()
g.add_edge('Alice', 'Bob')
g.add_edge('Alice', 'Charlie')
g.display()
        

Prompting AI for Implementing Data Structures

If you’re leveraging AI to generate data structure implementations, start with a precise prompt describing your requirements and context. Instead of asking, “How do I use data structures?”, try: “Can you help me implement a linked list in Python with methods for adding, deleting, and displaying nodes?”

Example Prompt:
Write a functional-style program (using dictionaries) to simulate a linked list. Include functions to append, prepend, and delete nodes.

Resulting AI-generated code:


# Functional Linked List using Dictionaries
def create_node(value, next_node=None):
    return {'value': value, 'next': next_node}

def append(head, value):
    if head is None:
        return create_node(value)
    current = head
    while current['next'] is not None:
        current = current['next']
    current['next'] = create_node(value)
    return head

def prepend(head, value):
    return create_node(value, head)

def delete_node(head, value):
    if head is None:
        return None
    if head['value'] == value:
        return head['next']

    current = head
    while current['next'] is not None:
        if current['next']['value'] == value:
            current['next'] = current['next']['next']
            return head
        current = current['next']
    return head

def to_string(head):
    current = head
    values = []
    while current is not None:
        values.append(str(current['value']))
        current = current['next']
    return ' -> '.join(values) + ' -> None'

# Example usage
head = None
head = append(head, 42)
head = append(head, "Mango")
head = prepend(head, 3.14)

print(f"List: {to_string(head)}")

head = delete_node(head, 42)
print(f"After deleting 42: {to_string(head)}")
        
Back to Home