An Introduction to the Art of Computer Programming Using Python in the Age of Generative AI

III. Strings and Text Data

Introduction to Strings

Strings are sequences of characters and are an essential data type in any programming language. In the context of Generative AI, strings are the primary medium of communication; whether you are sending a "prompt" to a Large Language Model (LLM) or processing its response, you are manipulating text data.

In Python, strings are immutable, meaning they cannot be changed once created. This chapter explores the various aspects of strings in Python, including their creation, manipulation, and some common operations performed on them.

Creating Strings

Strings in Python can be enclosed in single ('string') or double ("string") quotes. This flexibility allows you to easily use quotes inside of strings. For example, "255" denotes a string, not the integer 255.

For defining longer text that spans multiple lines—such as complex AI System Prompts or documentation—Python uses triple quotes ("""...""" or '''...'''). You can also use the + operator to concatenate (join) strings and the * operator to repeat them.


greeting = 'Hello, World!'
print(greeting)

sentence = "It's a beautiful day!"

# Concatenation
print(greeting + ' ' + sentence)

# Repetition
print(3 * (greeting + ' '))

# Multi-line strings (ideal for Prompts)
ai_prompt = """
System: You are a helpful coding assistant.
User: Explain Python strings.
"""
print(ai_prompt)

# Error demonstration: You cannot multiply a string by a string
try:
    print(greeting * sentence)
except TypeError as e:
    print(f"Error: {e}")
        

Accessing and Slicing Strings

You can access individual characters in a string using indexing, and a range of characters in a string using slicing. Python uses 0-based indexing.

A powerful feature in Python is negative indexing, where -1 refers to the last character, -2 the second to last, and so on. The slicing syntax is [start:stop:step]. Keep in mind that strings are immutable; you cannot change a character in place.


my_string = 'Hello, World!'
print(f"Length: {len(my_string)}")

# Accessing characters
print(f"First character: {my_string[0]}")
print(f"Last character: {my_string[-1]}")

# Slicing [start:stop]
print(f"Slice [7:12]: {my_string[7:12]}")

# Slicing with step (Reversing a string)
print(f"Reversed: {my_string[::-1]}")

# Attempting to modify a string directly causes an error
try:
    my_string[0] = 'B'
except TypeError as e:
    print(f"Error: {e}")
        

Common String Operations

Python provides a wide range of built-in methods for performing common string operations. These are essential for data cleaning—preparing raw text before feeding it into an AI model.

While split() breaks a string into a list of words, the join() method does the reverse, combining a list of strings into one.


my_string = ' Hello, World! '

# Case conversion
print(f"Lower: {my_string.lower()}")
print(f"Upper: {my_string.upper()}")

# Cleaning whitespace
clean_string = my_string.strip()
print(f"Strip: '{clean_string}'")

# Replacement
print(f"Replace: {clean_string.replace('World', 'Python')}")

# Splitting into a list
split_list = clean_string.split(',')
print(f"Split: {split_list}")

# Joining back together
print(f"Join: {' - '.join(split_list)}")
        

Formatting Strings

Python 3.6 introduced f-strings (formatted string literals), a way to embed expressions directly inside string literals. You can use f-strings to embed variables, perform math, and call functions directly within the string. This is the preferred method for modern Python programming.


name = 'World'
score = 95.6

# Simple variable embedding
print(f'Hello, {name}!')

# Expressions and formatting inside braces
# :.2f limits the float to 2 decimal places
print(f'The score is {score:.2f}, and half is {score / 2:.2f}')
        
Generative AI Insight: Prompt Engineering
Mastering string formatting is key to Prompt Engineering. When building applications with AI, you often create "prompt templates"—strings with placeholders (like f"Summarize this text: {user_input}"). The quality of your string construction directly affects the AI's output.

Handling Unicode

Unicode is a standard for encoding a wide variety of characters from many different writing systems. Python strings are Unicode by default, allowing characters from almost any written language (and emojis!) to be represented.


# Hebrew text
unicode_string = 'שלום עולם'
print(unicode_string)

# Emojis are also Unicode characters
emoji_string = "Python is cool 😎"
print(emoji_string)
        
Back to Home