I. Computer Memory and Data Representation

In the face of ambiguity, refuse the temptation to guess.

Tim Peters, Zen of Python

Before we Start: Program Output

Programming tutorials traditionally begin with a "Hello, World!" program. Displaying output is a fundamental way to interact with your code, since it gives you immediate feedback and helps debug logic errors. In Python, printing output is done with the print() function, which can handle one or many arguments in a single call.


print("Hello world!")

While print("Hello world!") is a classic example, Python's print() is flexible and can combine multiple pieces of text, either by separating them with spaces or concatenating them with the plus (+) operator.


print("Hello", "world","!")
print("Hello" + "world" + "!")

Binary Digits

Humans have intuitively used their fingers, and in some cultures their toes, for basic calculations. As a result, most cultures use a decimal system. The first mechanical devices for decimal arithmetic were invented around 2500 BC. In the 20th century, electromechanical and later electronic computers were developed. In theory, it would be possible to represent the decimal system in an electronic device by assigning 10 integers to 10 different electrical current or voltage values. However, practical problems such as temperature-sensitive resistances and interference make this approach hard to realize. A more robust system can be achieved by reducing the number system to two states: "off" (0) and "on" (1). Thus, the binary system emerged as the most reliable option, and modern computers rely on electrical circuits that can switch between these two states. A major milestone was the development of the transistor in 1947. Transistors are semiconductor devices that can be controlled to switch between binary states. Computer data storage is achieved by organizing these binary digits, or bits, with 8 bits typically making up a byte. Bytes are essential for storing a range of data types, from simple numbers to complex multimedia content. One of the primary goals of computer science is to translate "human" data into this binary format. Computers can then store this data, perform calculations on it, and store the results.


# Converting a decimal number to binary in Python
num = 13
binary_representation = bin(num)
print("The binary representation of", num, "is", binary_representation)

# Binary numbers in Python have the prefix "0b", we can remove this by slicing the string (for more information see: Chapter III. Strings and Text Data).
print("The binary representation of", num, "is", binary_representation[2:])

Storing Integers

As introduced earlier, computers can store data in bytes, which are typically made up of 8 bits. Thus, 1 byte can represent 2^8 = 256 different values. Positive integers are represented directly as binary numbers, while negative integers are represented using two's complement notation. Including zero, the range is from 0 to 255, which is why, for example, each pixel in an 8-bit grayscale image has a gray value between 0 (black) and 255 (white). For example, if you enter 255 and 256 in the decimal-to-binary converter below, you will see that the number 256 requires 9 bits to represent.

Convert integers base 10 to integers base 2:


# Displaying the binary representation of a positive and negative integer
positive_num = 42
negative_num = -42
print("Binary representation of", positive_num, ":", bin(positive_num)[2:])
print("Binary representation of", negative_num, ":", bin(negative_num & 0xff)[2:])

Floating-Point Representation

Representing fractions (numbers between integers) in the binary system is more challenging. Let's take 0.125 as an example. We can write 0.125 as 1/8 or 1/(2^3). This is easily converted to binary as 1/(10^11). But what about 0.1 or 1/10? There is no finite representation for this number in the binary system, so it must be approximated. In programming languages like C++, users can usually specify the approximation precision for such numbers by defining the type. For example, you can use 2, 4, 8, 16, or 32-byte floats. According to the IEEE Standard for Floating-Point Arithmetic (IEEE 754), floats are represented using a standard format that divides the binary representation into three parts: the sign, the exponent, and the fraction. This format allows a wide range of values to be represented, but can also lead to precision problems. Python uses 8-byte floats by default, which provides an accuracy of about 16 decimal places.

Enter a float base 10:

The following example shows that it is important to be aware of how fractions are represented in computers. The following loop (loops will be introduced in a later chapter) adds 10 times 0.1 and then checks if the result is 1.


sum = 0
x = 0.1

for i in range(0, 10):  # range(0, 10) = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    sum += x  # adds x to sum for each run

if sum == 1:
    print("0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 +0.1 + 0.1 +0.1 = 1")
else:
    print("0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 +0.1 + 0.1 + 0.1 =", sum, "?")

# Displaying the binary representation of a floating-point number
import struct

floating_num = 13.5
binary_representation = bin(struct.unpack('!I', struct.pack('!f', floating_num))[0])[2:]
print("Binary representation of", floating_num, ":", binary_representation)

Character Encoding: ASCII and Unicode

ASCII, or the American Standard Code for Information Interchange, was an early character encoding standard. It used 7 or 8 bits to represent each character, allowing for 128 or 256 unique characters. While this was sufficient for English, with its relatively small set of characters and symbols, it was inadequate for languages with larger character sets. To solve this problem, Unicode was introduced. It's a comprehensive encoding system that can use up to 32 bits per character, allowing it to store millions of unique symbols from different languages and cultures. This ensures textual integrity and promotes global interoperability and communication.


# Finding the ASCII and Unicode code points of a character
char = 'A'
ascii_value = ord(char)
unicode_value = ord(char)
print("ASCII value of", char, ":", ascii_value)
print("Unicode value of", char, ":", unicode_value)

Storing Strings

Strings are ordered sequences of characters, each of which occupies a specific memory location. In Python programming, strings are stored as character arrays, with each character assigned a unique memory address for efficient retrieval and manipulation. Python uses Unicode for its strings, which accommodates a wide range of characters from different languages and symbol systems. This Unicode compatibility allows Python to effectively manage international text, making it a robust option for global software development that requires multilingual support.


# Storing a string in Python
my_string = "Hello, World!"
print(f"The string is: {my_string}")
print(f"The memory address of the string is: {id(my_string)}")

Memory Addresses and Data Structures

Each location in a computer's memory has a unique address, much like each house on a street has a unique number. Data structures such as lists and dictionaries use these addresses to organize data. Lists keep data in sequential order so that it can be retrieved using an index. Dictionaries, on the other hand, use a key-value pair system for fast access using unique keys. These methods of using memory addresses provide the foundation for efficient data storage and retrieval, which is critical to the performance of data-intensive applications.


# Memory addresses in Python data structures
my_list = [1, 2, 3]
my_dict = {'a': 1, 'b': 2}
print(f"Memory address of my_list: {id(my_list)}")
print(f"Memory address of my_dict: {id(my_dict)}")