=4
x x
4
This section covers the topics required in the following chapters. We suggest covering this section for someone who has yet to gain previous knowledge of Python programming.
Python is a programming language that lets us work quickly and integrate systems more effectively and contains many data types as part of the core language (Foundation 2023).
The entities that we can create and manipulate in Python are called objects. We could make those objects by applying the assignment operator (‘=’).
For example, we create the object “a”; winch has assigned the value 4.
=4
x x
4
Each object in Python has its characteristics, type, and value.
Object type tells Python what kind of object it’s dealing with. A type could be a number, a string, a list, or something else. In this book, we will use those types of objects. We will also cover more complex data structures such as dictionaries, arrays, and data frames.
The function type() shows us the object type. For example, the object x is an integer(int):
type(x)
int
In this example, the object type is an integer(int), and the value is 4.
Besides integers, Python provides other numeric types, floating point numbers, and complex numbers (for example (5j). For example:
type(1.23)
float
An example of string (str) would be:
="Apple"
ytype(y)
str
We are adding the ” ” to tell Python that Apple is a string.
A list is another helpful object in Python, a vector of integers, strings, or both.
=[1,2,3]
liste liste
[1, 2, 3]
type(liste)
list
Objects whose values can change are called mutable, whereas objects whose values are unchangeable after they’ve been created are called immutable.
4 # is inmutable
# but liste is mutable
=["a","b"] liste
Most Python objects have data or functions associated with them. These are known as attributes. The name of the attribute follows the name of the object, separated by a dot. The two types of attributes are called data attributes or methods.
The “list” data type has some more methods. Here are all of the methods of “list” objects:
=[1,2,3,4]
sdir(s)[1:10]
['__class__',
'__class_getitem__',
'__contains__',
'__delattr__',
'__delitem__',
'__dir__',
'__doc__',
'__eq__',
'__format__']
We printed only the first ten methods, but you can find more in the list. This section will cover some examples, including the method append, which appends an object to the end of the list.
6)
s.append( s
[1, 2, 3, 4, 6]
A data attribute contains information about the object. For example, to get the number of elements of a list, we use the data method “len.”
len(s)
5
There are other ways to manipulate a “list” without a method. For instance, to concatenate two different lists:
=[12,14,16]
t=s+t
s s
[1, 2, 3, 4, 6, 12, 14, 16]
To select an element of a list, for example, selecting the second element of the list:
1] s[
2
As you can see, instead of typing the s[2], we write the number one because Python starts counting to zero.
0] s[
1
For example, to replace an element of a list, replace the number 3 in list “s” with the number 100.
2]=100
s[ s
[1, 2, 100, 4, 6, 12, 14, 16]
To remove an element, for example, number 2:
2)
s.remove( s
[1, 100, 4, 6, 12, 14, 16]
Dictionaries are helpful objects for performing fast look-ups on underscored data.
={"Paulina": 100, "Coral":95}
grades_dict grades_dict
{'Paulina': 100, 'Coral': 95}
The “dictionaries” have two components: keys and values. The keys:
grades_dict.keys()
dict_keys(['Paulina', 'Coral'])
And values:
grades_dict.values()
dict_values([100, 95])
Dictionaries are mutable, then we can modify their content by adding a new element:
"Alejandra"]=120
grades_dict[ grades_dict
{'Paulina': 100, 'Coral': 95, 'Alejandra': 120}
Or modifying an element:
# modify an element
"Alejandra"]=grades_dict["Alejandra"]-30
grades_dict[ grades_dict
{'Paulina': 100, 'Coral': 95, 'Alejandra': 90}
We could ask for an element in the dictionary:
# members
"Karina" in grades_dict
False
We could combine two or more lists into a dictionary, having two keys and four values each:
=["Eugenio","Luis","Isa","Gisell"]
names=[9,17,80,79]
numbers= {"Letthers":names,"Colors":numbers}
combine combine
{'Letthers': ['Eugenio', 'Luis', 'Isa', 'Gisell'], 'Colors': [9, 17, 80, 79]}
Or we may want to use one list for the keys and the other for the values:
= dict(zip(names, numbers))
stud stud
{'Eugenio': 9, 'Luis': 17, 'Isa': 80, 'Gisell': 79}
Python also contains building functions that all Python programs can use. The user could create functions or have someone else develop them in a library.
The Python modules are code libraries, and you can import Python modules using the import statements. Two of the most popular libraries in Python are Pandas and Numpy.
It is a Python package that provides fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data easy and intuitive. It is the fundamental high-level building block for practical, real-world data analysis in Python. Additionally, it aims to become the most powerful and flexible open-source data analysis/manipulation tool available in any language. It is already well on its way toward this goal.
We will use it to create and manipulate data frames, which are two-dimensional, size-mutable, potentially heterogeneous tabular data.
To install the Pandas module, we write in a notebook cell or in the Conda terminal prompt “pip install pandas” in the Terminal prompt.
pip install pandas
To import the library we use the “import” statement:
import pandas as pd
After the name Pandas, we use the “as” statement to give pandas a short name. When we use it, it is easier to call pd than Pandas. For example, to create a data frame, we have to call the “pd”. and the method DataFrame, referring to that we are calling the method DataFrame from the library pandas:
=["Vale","Diana","Ivan","Vivi"]
nameid=["A0145","A568","A5678","A0567"]
=[10,8,5,7]
grade1=["fem","fem","male","fem"] gen
In the previous example, an argument of the function is columns; winch is the column name of the data frame. To know more about the function and its arguments, we could ask like this:
help(pd.DataFrame) # or like this: pd.DataFrame?
In the previous example, by default, the index, the left column without a title, is numbered from zero to 3, but if we want to change the index:
=["Vale","Diana","Ivan","Vivi"]
name=[51,11,511,50]
numbers= pd.DataFrame(name,columns=["Column 1"],index=numbers)
df df
Column 1 | |
---|---|
51 | Vale |
11 | Diana |
511 | Ivan |
50 | Vivi |
Python dictionary is a collection of key-value pairs—valuable objects for fast look-ups on underscored data.
=pd.DataFrame({"name":name,"grade":grade1,"gender":gen},index=id)
df df
name | grade | gender | |
---|---|---|---|
A0145 | Vale | 10 | fem |
A568 | Diana | 8 | fem |
A5678 | Ivan | 5 | male |
A0567 | Vivi | 7 | fem |
To apply a method (function) to the data frame, we must type the Pandas object and a dot before the technique. A practical method is “shape,” which gives us the number of rows and columns.
df.shape
(4, 3)
The result is the number of rows and columns of the data frame.
To rename a data frame column:
={"Name": "New name"}) df.rename(columns
name | grade | gender | |
---|---|---|---|
A0145 | Vale | 10 | fem |
A568 | Diana | 8 | fem |
A5678 | Ivan | 5 | male |
A0567 | Vivi | 7 | fem |
To select a column, we could type the column name:
"grade"] df[
A0145 10
A568 8
A5678 5
A0567 7
Name: grade, dtype: int64
To select one or more rows, we use the method “.loc”
"gender","grade")] df.loc[:,(
gender | grade | |
---|---|---|
A0145 | fem | 10 |
A568 | fem | 8 |
A5678 | male | 5 |
A0567 | fem | 7 |
The “loc” method also works for selecting a columns and rows:
"A568"],("gender","grade")] df.loc[[
gender | grade | |
---|---|---|
A568 | fem | 8 |
Sometimes is useful to select by position. We use the method .iloc[ rows, columns ] in this case. For example, to select the second and third columns:
1:3] df.iloc[:,
grade | gender | |
---|---|---|
A0145 | 10 | fem |
A568 | 8 | fem |
A5678 | 5 | male |
A0567 | 7 | fem |
The left side of the comma is for selecting rows, and the right is for columns. Remember that Python starts counting on zero. Then, the column name is column zero, and the grade is column one. Also, when we select 1:3, we tell Python to select columns one to three without including the third column.
To insert a new column, grade2 into the data frame df.
=[9,10,9,8]
grade2 grade2
[9, 10, 9, 8]
"grade2"]=grade2
df[ df
name | grade | gender | grade2 | |
---|---|---|---|---|
A0145 | Vale | 10 | fem | 9 |
A568 | Diana | 8 | fem | 10 |
A5678 | Ivan | 5 | male | 9 |
A0567 | Vivi | 7 | fem | 8 |
To drooping column(s):
=df.drop(columns=["gender"])
df2 df2
name | grade | grade2 | |
---|---|---|---|
A0145 | Vale | 10 | 9 |
A568 | Diana | 8 | 10 |
A5678 | Ivan | 5 | 9 |
A0567 | Vivi | 7 | 8 |
You can download the Excel file by copying and pasting it to a browser through the following link:
https://github.com/abernal30/ML_python/blob/main/df.xlsx
I stored the file in a sub-directory named “data,” and I called “df.xlsx”
To verify the names of the Sheets, we use the following code:
import pandas as pd
=pd.read_csv("https://raw.githubusercontent.com/abernal30/AFP_py/refs/heads/main/data/df_act.csv",index_col=0)
data data
Student | Activity 1 | Activity 2 | Activity 3 | |
---|---|---|---|---|
A0145 | Vale | 10 | 9 | 8 |
A568 | Diana | 8 | 10 | 9 |
A5678 | Ivan | 5 | 9 | 10 |
baja | Vivi | 7 | 8 | 5 |
I use the function read_excel of the Pandas library to read the Excel file. In this case, I use the argument sheet_name=sheets[0], equivalent to sheet_name=“Sheet1”.
If we want to concatenate two data frames, we use the function concatenate. This function is useful when we want to concatenate by the index, which is the student number.
=pd.concat([df2,data],axis=1)
data2 data2
name | grade | grade2 | Student | Activity 1 | Activity 2 | Activity 3 | |
---|---|---|---|---|---|---|---|
A0145 | Vale | 10.0 | 9.0 | Vale | 10.0 | 9.0 | 8.0 |
A568 | Diana | 8.0 | 10.0 | Diana | 8.0 | 10.0 | 9.0 |
A5678 | Ivan | 5.0 | 9.0 | Ivan | 5.0 | 9.0 | 10.0 |
A0567 | Vivi | 7.0 | 8.0 | NaN | NaN | NaN | NaN |
baja | NaN | NaN | NaN | Vivi | 7.0 | 8.0 | 5.0 |
More about merge, concatenate, join in https://pandas.pydata.org/docs/user_guide/merging.html
Filtering a data frame. For example, to grades higher than 7.
To filter the data in a data frame for a number.
"grade"]>7] data2[data2[
name | grade | grade2 | Student | Activity 1 | Activity 2 | Activity 3 | |
---|---|---|---|---|---|---|---|
A0145 | Vale | 10.0 | 9.0 | Vale | 10.0 | 9.0 | 8.0 |
A568 | Diana | 8.0 | 10.0 | Diana | 8.0 | 10.0 | 9.0 |
Ot to get a string (text). For example, Student = Diana.
To filter the data in a data frame for a string (text).
"Student"]=="Diana"] data2[data2[
name | grade | grade2 | Student | Activity 1 | Activity 2 | Activity 3 | |
---|---|---|---|---|---|---|---|
A568 | Diana | 8.0 | 10.0 | Diana | 8.0 | 10.0 | 9.0 |
NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, introductory linear algebra, basic statistical operations, random simulation and much more.
It is useful to work with Numpy arrays in machine learning models. NumPy’s main object is the homogeneous multidimensional array. For example, we could define variables x and y as an array.
import numpy as np
=np.array([[1,2,3],[4,5,6]])
X X
array([[1, 2, 3],
[4, 5, 6]])
Or we can create a matrix for several variables.
=np.array([[2,4,6],[8,10,12]])
Y Y
array([[ 2, 4, 6],
[ 8, 10, 12]])
Sometimes is useful to simulate a missing value:
np.nan
nan