In the face of ambiguity, refuse the temptation to guess.
Tim Peters, Zen of Python
Programming tutorials traditionally begin with a "Hello, World!" program. Displaying output is a fundamental
way to interact with your code, since it gives you immediate feedback and helps debug logic errors. In
Python, printing output is done with the print() function, which can handle one
or many arguments in a single call.
print("Hello world!")
While print("Hello world!") is a classic example, Python's print()
is flexible and can combine multiple pieces of text, either by separating them with spaces or
concatenating them with the plus (+) operator.
In modern Python (versions 3.6+), the preferred method for formatting text is using f-strings (Formatted String Literals). They allow you to embed expressions directly inside string literals, making the code more readable and concise.
print("Hello", "world","!")
print("Hello" + "world" + "!")
# Modern approach using f-strings
world_var = "world"
print(f"Hello {world_var}!")
Humans have intuitively used their fingers, and in some cultures their toes, for basic calculations. As a result, most cultures use a decimal system (base-10). The first mechanical devices for decimal arithmetic were invented around 2500 BC. In the 20th century, electromechanical and later electronic computers were developed. In theory, it would be possible to represent the decimal system in an electronic device by assigning 10 integers to 10 different electrical current or voltage values. However, practical problems such as temperature-sensitive resistances and interference make this approach hard to realize. A more robust system can be achieved by reducing the number system to two states: "off" (0) and "on" (1).
Thus, the binary system (base-2) emerged as the most reliable option, and modern computers rely on electrical circuits that can switch between these two states. A major milestone was the development of the transistor in 1947. Transistors are semiconductor devices that can be controlled to switch between binary states. Computer data storage is achieved by organizing these binary digits, or bits, with 8 bits typically making up a byte.
Bytes are essential for storing a range of data types, from simple numbers to complex multimedia content. One of the primary goals of computer science is to translate "human" data into this binary format. Computers can then store this data, perform calculations on it, and store the results. In the context of Generative AI, even the most complex Large Language Models (LLMs) ultimately process vast arrays of these binary numbers (represented as tensors) to generate text and images.
# Converting a decimal number to binary in Python
num = 13
binary_representation = bin(num)
print("The binary representation of", num, "is", binary_representation)
# Binary numbers in Python have the prefix "0b", we can remove this by slicing the string (for more information see: Chapter III. Strings and Text Data).
print("The binary representation of", num, "is", binary_representation[2:])
As introduced earlier, computers can store data in bytes, which are typically made up of 8 bits. Thus, 1 byte can represent 2^8 = 256 different values. Positive integers are represented directly as binary numbers, while negative integers are represented using two's complement notation. Including zero, the range for a signed 8-bit integer is from -128 to 127. However, for unsigned 8-bit values (like colors), the range is from 0 to 255.
This is why each pixel in an 8-bit grayscale image has a gray value between 0 (black) and 255 (white). For example, if you enter 255 and 256 in the decimal-to-binary converter below, you will see that the number 256 requires 9 bits to represent, overflowing a single byte.
Note on the code below: Python integers have arbitrary precision (they can grow as large as memory
allows) and do not have a fixed bit-width like C++ or Java integers. To visualize negative numbers in their
binary "two's complement" form, we often use a bitwise mask (like & 0xff) to
limit the output to 8 bits.
# Displaying the binary representation of a positive and negative integer
positive_num = 42
negative_num = -42
print("Binary representation of", positive_num, ":", bin(positive_num)[2:])
# Using & 0xff restricts the view to the last 8 bits, showing the 2's complement form
print("Binary representation of", negative_num, ":", bin(negative_num & 0xff)[2:])
Representing fractions (numbers between integers) in the binary system is more challenging. Let's take 0.125 as an example. We can write 0.125 as 1/8 or 1/(2^3). This is easily converted to binary as 0.001 (base 2). But what about 0.1 or 1/10? There is no finite representation for this number in the binary system, much like 1/3 has no finite representation in decimal (0.3333...). It must be approximated.
In programming languages like C++, users can usually specify the approximation precision for such numbers by defining the type. For example, you can use 2, 4, 8, 16, or 32-byte floats. According to the IEEE Standard for Floating-Point Arithmetic (IEEE 754), floats are represented using a standard format that divides the binary representation into three parts: the sign, the exponent, and the fraction. This format allows a wide range of values to be represented but can lead to precision problems.
Python uses 8-byte (64-bit) floats by default, which provides an accuracy of about 15-17 decimal places. In the field of AI, optimizing these types is crucial; models often use lower precision (like 16-bit or "quantized" 4-bit floats) to save memory and speed up calculation.
The following example shows that it is important to be aware of how fractions are represented in computers. The following loop (loops will be introduced in a later chapter) adds 10 times 0.1 and then checks if the result is exactly 1.
sum_val = 0
x = 0.1
for i in range(0, 10): # range(0, 10) = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
sum_val += x # adds x to sum for each run
# Due to floating point imprecision, sum_val will not be exactly 1.0
if sum_val == 1:
print("0.1 + ... + 0.1 = 1")
else:
print(f"0.1 + ... + 0.1 = {sum_val} ?")
print("Exact 1.0 match failed due to floating point error.")
# Solution: Use math.isclose() for comparisons
import math
if math.isclose(sum_val, 1.0):
print("Using math.isclose(): The value is effectively 1.0")
# Displaying the binary representation of a floating-point number (64-bit)
import struct
floating_num = 13.5
# We pack the float as a double (!d) and unpack as unsigned long long (!Q) to see bits
binary_representation = bin(struct.unpack('!Q', struct.pack('!d', floating_num))[0])[2:]
print("Binary representation of", floating_num, ":", binary_representation)
ASCII (American Standard Code for Information Interchange) was an early character encoding standard. It used 7 or 8 bits to represent each character, allowing for 128 or 256 unique characters. While this was sufficient for English, with its relatively small set of characters and symbols, it was inadequate for languages with larger character sets (like Chinese or Arabic) or emojis.
To solve this problem, Unicode was introduced. It's a comprehensive encoding system that can use up to 32 bits per character, allowing it to store millions of unique symbols from different languages and cultures. This ensures textual integrity and promotes global interoperability and communication.
# Finding the ASCII and Unicode code points of a character
char = 'A'
ascii_value = ord(char)
unicode_value = ord(char)
print(f"ASCII value of '{char}': {ascii_value}")
print(f"Unicode value of '{char}': {unicode_value}")
# Emoji example (requires Unicode)
emoji = '🤖'
print(f"Unicode value of '{emoji}': {ord(emoji)}")
Strings are ordered sequences of characters, each of which occupies a specific memory location. In Python programming, strings are stored as character arrays, with each character assigned a unique memory address for efficient retrieval and manipulation. Python uses Unicode for its strings by default (since Python 3), which accommodates a wide range of characters from different languages and symbol systems.
A key characteristic of Python strings is that they are immutable. This means that once a string is created in memory, it cannot be changed. Any operation that appears to modify a string actually creates a new string object at a new memory address. This immutability helps Python manage memory efficiently.
# Storing a string in Python
my_string = "Hello, World!"
print(f"The string is: {my_string}")
print(f"The memory address of the string is: {id(my_string)}")
# Immutability demonstration:
# Concatenating creates a NEW string at a NEW address
my_string = my_string + " How are you?"
print(f"New memory address after modification: {id(my_string)}")
Each location in a computer's memory has a unique address, much like each house on a street has a unique number. Data structures such as lists and dictionaries use these addresses to organize data. Lists keep data in sequential order so that it can be retrieved using an index. Dictionaries, on the other hand, use a key-value pair system for fast access using unique keys (hash maps).
In Python, the id() function returns the memory address (identity) of an object.
Understanding this identity is important when distinguishing between variables that have the same
value (checked with ==) versus variables that point to the exact same
object in memory (checked with is).
# Memory addresses in Python data structures
my_list = [1, 2, 3]
my_dict = {'a': 1, 'b': 2}
print(f"Memory address of my_list: {id(my_list)}")
print(f"Memory address of my_dict: {id(my_dict)}")
# Identity vs Equality
list_a = [1, 2, 3]
list_b = [1, 2, 3]
print(f"Values are equal (==): {list_a == list_b}")
print(f"Identities are same (is): {list_a is list_b}") # False, because they are two different objects in memory