1  Python Basics

This section covers the topics required in the following chapters. We suggest covering this section for someone who has yet to gain previous knowledge of Python programming.

1.1 Data types, numerical and text objects

Python is a programming language that lets us work quickly and integrate systems more effectively and contains many data types as part of the core language (Foundation 2023).

The entities that we can create and manipulate in Python are called objects. We could make those objects by applying the assignment operator (‘=’).

For example, we create the object “a”; winch has assigned the value 4.

x=4
x
4

Each object in Python has its characteristics, type, and value.

Object type tells Python what kind of object it’s dealing with. A type could be a number, a string, a list, or something else. In this book, we will use those types of objects. We will also cover more complex data structures such as dictionaries, arrays, and data frames.

The function type() shows us the object type. For example, the object x is an integer(int):

type(x)
int

In this example, the object type is an integer(int), and the value is 4.

Besides integers, Python provides other numeric types, floating point numbers, and complex numbers (for example (5j). For example:

type(1.23)
float

An example of string (str) would be:

y="Apple"
type(y)
str

We are adding the ” ” to tell Python that Apple is a string.

1.2 List and object attributes

A list is another helpful object in Python, a vector of integers, strings, or both.

liste=[1,2,3]
liste
[1, 2, 3]
type(liste)
list

Objects whose values can change are called mutable, whereas objects whose values are unchangeable after they’ve been created are called immutable.

4 # is inmutable

# but liste is mutable

liste=["a","b"]

Most Python objects have data or functions associated with them. These are known as attributes. The name of the attribute follows the name of the object, separated by a dot. The two types of attributes are called data attributes or methods.

The “list” data type has some more methods. Here are all of the methods of “list” objects:

s=[1,2,3,4]
dir(s)[1:10]
['__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__']

We printed only the first ten methods, but you can find more in the list. This section will cover some examples, including the method append, which appends an object to the end of the list.

s.append(6)
s
[1, 2, 3, 4, 6]

A data attribute contains information about the object. For example, to get the number of elements of a list, we use the data method “len.”

len(s)
5

There are other ways to manipulate a “list” without a method. For instance, to concatenate two different lists:

t=[12,14,16]
s=s+t
s
[1, 2, 3, 4, 6, 12, 14, 16]

To select an element of a list, for example, selecting the second element of the list:

s[1]
2

As you can see, instead of typing the s[2], we write the number one because Python starts counting to zero.

s[0]
1

For example, to replace an element of a list, replace the number 3 in list “s” with the number 100.

s[2]=100
s
[1, 2, 100, 4, 6, 12, 14, 16]

To remove an element, for example, number 2:

s.remove(2)
s
[1, 100, 4, 6, 12, 14, 16]

1.3 Dictionaries

Dictionaries are helpful objects for performing fast look-ups on underscored data.

grades_dict={"Paulina": 100, "Coral":95} 
grades_dict
{'Paulina': 100, 'Coral': 95}

The “dictionaries” have two components: keys and values. The keys:

grades_dict.keys()
dict_keys(['Paulina', 'Coral'])

And values:

grades_dict.values()
dict_values([100, 95])

Dictionaries are mutable, then we can modify their content by adding a new element:

grades_dict["Alejandra"]=120
grades_dict
{'Paulina': 100, 'Coral': 95, 'Alejandra': 120}

Or modifying an element:

# modify an element
grades_dict["Alejandra"]=grades_dict["Alejandra"]-30
grades_dict
{'Paulina': 100, 'Coral': 95, 'Alejandra': 90}

We could ask for an element in the dictionary:

# members

"Karina" in grades_dict
False

We could combine two or more lists into a dictionary, having two keys and four values each:

names=["Eugenio","Luis","Isa","Gisell"]
numbers=[9,17,80,79]
combine = {"Letthers":names,"Colors":numbers}
combine
{'Letthers': ['Eugenio', 'Luis', 'Isa', 'Gisell'], 'Colors': [9, 17, 80, 79]}

Or we may want to use one list for the keys and the other for the values:

stud= dict(zip(names, numbers))
stud
{'Eugenio': 9, 'Luis': 17, 'Isa': 80, 'Gisell': 79}

1.4 Python modules

Python also contains building functions that all Python programs can use. The user could create functions or have someone else develop them in a library.

The Python modules are code libraries, and you can import Python modules using the import statements. Two of the most popular libraries in Python are Pandas and Numpy.

1.5 Pandas data frames

It is a Python package that provides fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data easy and intuitive. It is the fundamental high-level building block for practical, real-world data analysis in Python. Additionally, it aims to become the most powerful and flexible open-source data analysis/manipulation tool available in any language. It is already well on its way toward this goal.

We will use it to create and manipulate data frames, which are two-dimensional, size-mutable, potentially heterogeneous tabular data.

To install the Pandas module, we write in a notebook cell or in the Conda terminal prompt “pip install pandas” in the Terminal prompt.

pip install pandas

To import the library we use the “import” statement:

import pandas as pd

After the name Pandas, we use the “as” statement to give pandas a short name. When we use it, it is easier to call pd than Pandas. For example, to create a data frame, we have to call the “pd”. and the method DataFrame, referring to that we are calling the method DataFrame from the library pandas:

name=["Vale","Diana","Ivan","Vivi"]
id=["A0145","A568","A5678","A0567"]
grade1=[10,8,5,7]
gen=["fem","fem","male","fem"]

In the previous example, an argument of the function is columns; winch is the column name of the data frame. To know more about the function and its arguments, we could ask like this:

help(pd.DataFrame) # or like this: pd.DataFrame?

In the previous example, by default, the index, the left column without a title, is numbered from zero to 3, but if we want to change the index:

name=["Vale","Diana","Ivan","Vivi"]
numbers=[51,11,511,50]
df = pd.DataFrame(name,columns=["Column 1"],index=numbers)
df
Column 1
51 Vale
11 Diana
511 Ivan
50 Vivi

Python dictionary is a collection of key-value pairs—valuable objects for fast look-ups on underscored data.

df=pd.DataFrame({"name":name,"grade":grade1,"gender":gen},index=id)
df
name grade gender
A0145 Vale 10 fem
A568 Diana 8 fem
A5678 Ivan 5 male
A0567 Vivi 7 fem

To apply a method (function) to the data frame, we must type the Pandas object and a dot before the technique. A practical method is “shape,” which gives us the number of rows and columns.

df.shape 
(4, 3)

The result is the number of rows and columns of the data frame.

To rename a data frame column:

df.rename(columns={"Name": "New name"})
name grade gender
A0145 Vale 10 fem
A568 Diana 8 fem
A5678 Ivan 5 male
A0567 Vivi 7 fem

To select a column, we could type the column name:

 df["grade"]
A0145    10
A568      8
A5678     5
A0567     7
Name: grade, dtype: int64

To select one or more rows, we use the method “.loc”

 df.loc[:,("gender","grade")]
gender grade
A0145 fem 10
A568 fem 8
A5678 male 5
A0567 fem 7

The “loc” method also works for selecting a columns and rows:

df.loc[["A568"],("gender","grade")]
gender grade
A568 fem 8

Sometimes is useful to select by position. We use the method .iloc[ rows, columns ] in this case. For example, to select the second and third columns:

 df.iloc[:,1:3]
grade gender
A0145 10 fem
A568 8 fem
A5678 5 male
A0567 7 fem

The left side of the comma is for selecting rows, and the right is for columns. Remember that Python starts counting on zero. Then, the column name is column zero, and the grade is column one. Also, when we select 1:3, we tell Python to select columns one to three without including the third column.

To insert a new column, grade2 into the data frame df.

grade2=[9,10,9,8]
grade2
[9, 10, 9, 8]
df["grade2"]=grade2
df
name grade gender grade2
A0145 Vale 10 fem 9
A568 Diana 8 fem 10
A5678 Ivan 5 male 9
A0567 Vivi 7 fem 8

To drooping column(s):

df2=df.drop(columns=["gender"])
df2
name grade grade2
A0145 Vale 10 9
A568 Diana 8 10
A5678 Ivan 5 9
A0567 Vivi 7 8

1.6 Reading Excel and csv files

You can download the Excel file by copying and pasting it to a browser through the following link:

https://github.com/abernal30/ML_python/blob/main/df.xlsx

I stored the file in a sub-directory named “data,” and I called “df.xlsx”

To verify the names of the Sheets, we use the following code:

import pandas as pd

data=pd.read_csv("https://raw.githubusercontent.com/abernal30/AFP_py/refs/heads/main/data/df_act.csv",index_col=0)
data
Student Activity 1 Activity 2 Activity 3
A0145 Vale 10 9 8
A568 Diana 8 10 9
A5678 Ivan 5 9 10
baja Vivi 7 8 5

I use the function read_excel of the Pandas library to read the Excel file. In this case, I use the argument sheet_name=sheets[0], equivalent to sheet_name=“Sheet1”.

If we want to concatenate two data frames, we use the function concatenate. This function is useful when we want to concatenate by the index, which is the student number.

data2=pd.concat([df2,data],axis=1)
data2 
name grade grade2 Student Activity 1 Activity 2 Activity 3
A0145 Vale 10.0 9.0 Vale 10.0 9.0 8.0
A568 Diana 8.0 10.0 Diana 8.0 10.0 9.0
A5678 Ivan 5.0 9.0 Ivan 5.0 9.0 10.0
A0567 Vivi 7.0 8.0 NaN NaN NaN NaN
baja NaN NaN NaN Vivi 7.0 8.0 5.0

More about merge, concatenate, join in https://pandas.pydata.org/docs/user_guide/merging.html

Filtering a data frame. For example, to grades higher than 7.

1.7 Filtering

To filter the data in a data frame for a number.

data2[data2["grade"]>7]
name grade grade2 Student Activity 1 Activity 2 Activity 3
A0145 Vale 10.0 9.0 Vale 10.0 9.0 8.0
A568 Diana 8.0 10.0 Diana 8.0 10.0 9.0

Ot to get a string (text). For example, Student = Diana.

To filter the data in a data frame for a string (text).

data2[data2["Student"]=="Diana"]
name grade grade2 Student Activity 1 Activity 2 Activity 3
A568 Diana 8.0 10.0 Diana 8.0 10.0 9.0

1.8 Numpy modules

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, introductory linear algebra, basic statistical operations, random simulation and much more.

It is useful to work with Numpy arrays in machine learning models. NumPy’s main object is the homogeneous multidimensional array. For example, we could define variables x and y as an array.

import numpy as np

X=np.array([[1,2,3],[4,5,6]])
X
array([[1, 2, 3],
       [4, 5, 6]])

Or we can create a matrix for several variables.

Y=np.array([[2,4,6],[8,10,12]])
Y
array([[ 2,  4,  6],
       [ 8, 10, 12]])

Sometimes is useful to simulate a missing value:

np.nan
nan