1 Python Basics
This section covers the topics required in the following chapters. We suggest covering this section for someone who has yet to gain previous knowledge of Python programming.
1.1 1 Jupyter Notebook
In this book, we will work on Jupyter Notebook, which is the original web application for creating and sharing computational documents. It offers a simple, streamlined, document-centric experience.
See more about in https://jupyter.org/.
1.2 Data types, numerical and text objects
Python is a programming language that lets us work quickly and integrate systems more effectively and contains many data types as part of the core language (py?).
The entities that we can create and manipulate in Python are called objects. We could make those objects by applying the assignment operator (‘=’).
For example, we create the object “a”; winch has assigned the value 4.
=4
x
x#> 4
Each object in Python has tho characteristics, object type and object value.
Object type tells Python what kind of an object it’s dealing with. A type could be a number, a string, a list, or something else. In this book, we will use those types of objects. Also, we will cover more complex data structures such as dictionaries, arrays and data frames.
The function type() shows us the object type. For example, the object x is an integer(int):
type(x)
#> <class 'int'>
In this example, the object type is an integer(int), and the value is 4.
Besides integers, Python provides other numeric types, floating point numbers, and complex numbers (for example (5j). For example:
type(1.23)
#> <class 'float'>
An example of string (str) would be:
="Apple"
ytype(y)
#> <class 'str'>
We are adding the ” ” to tell Python that Apple is a string.
1.3 List and object attributes
A list is another useful object in Python, a vector of integers, strings, or both.
=[1,2,3]
liste
liste#> [1, 2, 3]
type(liste)
#> <class 'list'>
Objects whose value can change are called mutable objects, whereas objects whose value is unchangeable after they’ve been created are called immutable.
4 # is inmutable
# but liste is mutable
#> 4
=["a","b"] liste
Most Python objects have either data or functions or both associated with them. These are known as attributes. The name of the attribute follows the name of the object. And these two are separated by a dot in between them. The two types of attributes are called either data attributes or methods.
The “list” data type has some more methods. Here are all of the methods of “list” objects:
=[1,2,3,4]
sdir(s)[1:10]
#> ['__class__', '__class_getitem__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__']
We printed only the first ten methods. But in the list, you can find more methods. In this section, we will cover some examples—the method append, which appends an object to the end of the list.
6)
s.append(
s#> [1, 2, 3, 4, 6]
A data attribute contains information about the object. For example, to get the number of elements of a list, we use the data method “len.”
len(s)
#> 5
There are other ways to manipulate a “list” without a method. For instance, to concatenate two different lists:
=[12,14,16]
t=s+t
s
s#> [1, 2, 3, 4, 6, 12, 14, 16]
To select an element of a list, for example, selecting the second element of the list:
1]
s[#> 2
As you can see, instead of typing the s[2], we write the number one because Python starts counting to zero.
0]
s[#> 1
For example, replace the number 3 in lis “s” with the number 100 to replace an element of a list.
2]=100
s[
s#> [1, 2, 100, 4, 6, 12, 14, 16]
To remove an element, for example, number 2:
2)
s.remove(
s#> [1, 100, 4, 6, 12, 14, 16]
1.4 Dictionaries
Dictionaries are useful objects for performing fast look-ups on underscored data.
={"Paulina": 100, "Coral":95}
grades_dict
grades_dict#> {'Paulina': 100, 'Coral': 95}
The “dictionaries” have two components, keys and values. The keys:
grades_dict.keys()#> dict_keys(['Paulina', 'Coral'])
And values:
grades_dict.values()#> dict_values([100, 95])
Dictionaries are mutable, then we can modify their content, by adding a new element:
"Alejandra"]=120
grades_dict[
grades_dict#> {'Paulina': 100, 'Coral': 95, 'Alejandra': 120}
Or modifying an element:
# modify an element
"Alejandra"]=grades_dict["Alejandra"]-30
grades_dict[
grades_dict#> {'Paulina': 100, 'Coral': 95, 'Alejandra': 90}
We could ask for an element in the dictionary:
# members
"Karina" in grades_dict
#> False
We could combine two or more lists into a dictionary, having two keys and four values each:
=["Eugenio","Luis","Isa","Gisell"]
names=[9,17,80,79]
numbers= {"Letthers":names,"Colors":numbers}
combine
combine#> {'Letthers': ['Eugenio', 'Luis', 'Isa', 'Gisell'], 'Colors': [9, 17, 80, 79]}
Or we may want to use one list for the keys and the other for the values:
= dict(zip(names, numbers))
stud
stud#> {'Eugenio': 9, 'Luis': 17, 'Isa': 80, 'Gisell': 79}
1.5 Python modules
Python also contains building functions that all Python programs can use. The user could make functions or could be developed by someone else in a library.
The Python modules are code libraries, and you can import Python modules using the import statements. Two of the most popular libraries in Python are Pandas and Numpy.
1.5.1 Pandas data frames
It is a Python package that provides fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data easy and intuitive. It is the fundamental high-level building block for practical, real-world data analysis in Python. Additionally, it aims to become the most powerful and flexible open-source data analysis/manipulation tool available in any language. It is already well on its way toward this goal.
We will use it to create and manipulate data frames, which are two-dimensional, size-mutable, potentially heterogeneous tabular data.
To install the Pandas module, we write in a notebook cell or in the Conda terminal prompt “pip install pandas” in the Terminal prompt.
pip install pandas
To import the library we use the “import” statement:
import pandas as pd
After the name Pandas, we use the “as” statement to give pandas a short name. Then, when we use it, it is easier to call pd than pandas. For example, to create a data frame, we have to call the “pd”. and the method DataFrame, referring to that we are calling the method DataFrame from the library pandas:
name=["Vale","Diana","Ivan","Vivi"]
df = pd.DataFrame(name,columns=["Column 1"])
df
Column 1 | |
---|---|
0 | Vale |
1 | Diana |
2 | Ivan |
3 | Vivi |
In the previous example, an argument of the function is columns; winch is the column name of the data frame. To know more about the function and its arguments, we could ask like this:
help(pd.DataFrame) # or like this: pd.DataFrame?
In the previous example, by default, the index, the left column without a title, is numbered from zero to 3, but if we want to change the index:
name=["Vale","Diana","Ivan","Vivi"]
numbers=[51,11,511,50]
df = pd.DataFrame(name,columns=["Column 1"],index=numbers)
df
Column 1 | |
---|---|
51 | Vale |
11 | Diana |
511 | Ivan |
50 | Vivi |
We could take advantage of the creation of a dictionary and transform it into a data frame:
my_dict=dict(zip(name, numbers))
sn=pd.DataFrame(my_dict,index=["Student_number"])
sn
Vale | Diana | Ivan | Vivi | |
---|---|---|---|---|
Student_number | 51 | 11 | 511 | 50 |
If we want to have the student_number as columns instead of a row, we could transpose the data frame:
sn.transpose()
Student_number | |
---|---|
Vale | 51 |
Diana | 11 |
Ivan | 511 |
Vivi | 50 |
name=["Estefanía","Laura Yanet","María Guadalupe","Karla Lizette"]
nick=["Estef","Yanet","Lupita","Karla"]
number=[1,2,3,4]
# We use two list to create the dictionary
combine = {"Nick_name":nick,"Name":name}
# and from the dictionary we create the data frame
df=pd.DataFrame(combine,index=number)
df
Nick_name | name | |
---|---|---|
1 | Estef | Estefanía |
2 | Yanet | Laura Yanet |
3 | Lupita | María Guadalupe |
4 | Karla | Karla Lizette |
To apply a method (function) to the data frame, we must type the Pandas object and a dot before the method. A useful method is “shape,” which gives us the number of rows and columns.
df.shape #> (4, 2)
To rename a data frame column:
df=df.rename(columns={"Nick_name": "Nick", "name": "Names"})
df
Nick | Names | |
---|---|---|
1 | Estef | Estefanía |
2 | Yanet | Laura Yanet |
3 | Lupita | María Guadalupe |
4 | Karla | Karla Lizette |
It also applies for index:
df=df.rename(index={1: "x", 2: "y", 3: "z",4:"w"})
df
Nick | Names | |
---|---|---|
x | Estef | Estefanía |
y | Yanet | Laura Yanet |
z | Lupita | María Guadalupe |
w | Karla | Karla Lizette |
1.5.2 Selecting rows and columns in a data frame
To select a column, we could type the column name:
"Nick"]
df[#> x Estef
#> y Yanet
#> z Lupita
#> w Karla
#> Name: Nick, dtype: object
The resulting object is a pandas series, a one-dimensional object, such as a list, but with an index, in this case, the data frame index.
type(df["Nick"])
#> <class 'pandas.core.series.Series'>
If we want to keep the data frame type, we should add the square brackets twice:
df[["Nick"]]
Nick | |
---|---|
x | Estef |
y | Yanet |
z | Lupita |
w | Karla |
Or two columns at once:
df[["Nick","Names"]]
Nick | Names | |
---|---|---|
x | Estef | Estefanía |
y | Yanet | Laura Yanet |
z | Lupita | María Guadalupe |
w | Karla | Karla Lizette |
To select rows, we use the method “.loc”.
df.loc[["y"]]
Nick | Names | |
---|---|---|
y | Yanet | Laura Yanet |
df.loc[["y","z"]]
Nick | Names | |
---|---|---|
y | Yanet | Laura Yanet |
z | Lupita | María Guadalupe |
The “loc” method also works for selecting a column:
df.loc[:, ("Nick","Names")]
Or more than one column:
Nick | Names | |
---|---|---|
x | Estef | Estefanía |
y | Yanet | Laura Yanet |
z | Lupita | María Guadalupe |
w | Karla | Karla Lizette |
Sometimes is useful to select by position. We use the method .iloc[ rows, columns ] in this case. For example, to select the second and third columns:
df.iloc[: , 1:]
Names | |
---|---|
x | Estefanía |
y | Laura Yanet |
z | María Guadalupe |
w | Karla Lizette |
The left side of the comma is for selecting rows, and the right is for columns. Another example:
df.iloc[: , 0:2]
Nick | Names | |
---|---|---|
x | Estef | Estefanía |
y | Yanet | Laura Yanet |
z | Lupita | María Guadalupe |
w | Karla | Karla Lizette |
For rows:
df.iloc[2: , ]
Nick | Names | |
---|---|---|
z | Lupita | María Guadalupe |
w | Karla | Karla Lizette |
To insert a new column:
num_2=list(range(4))
df["Numbers_2"]=num_2
df
Nick | Names | Numbers_2 | |
---|---|---|---|
x | Estef | Estefanía | 0 |
y | Yanet | Laura Yanet | 1 |
z | Lupita | María Guadalupe | 2 |
w | Karla | Karla Lizette | 3 |
The range method return an object that produces a sequence of integers from start (inclusive) to stop (exclusive) by step. range(i, j) produces i, i+1, i+2, …, j-1.
To drooping colum(s):
df.drop(columns=['Names'])
Nick | Numbers_2 | |
---|---|---|
x | Estef | 0 |
y | Yanet | 1 |
z | Lupita | 2 |
w | Karla | 3 |
df.drop(columns=['Names',"Numbers_2"])
Nick | |
---|---|
x | Estef |
y | Yanet |
z | Lupita |
w | Karla |
1.6 Reading Excel and csv files
You can download the Excel file by copying and pasting and pasting to a browser through the following link:
https://github.com/abernal30/ML_python/blob/main/df.xlsx
I stored the file in a sub-directory named “data,” and I called “df.xlsx”
To verify the names of the Sheets, we use the following code:
import pandas as pd
=pd.ExcelFile("data/df.xlsx").sheet_names
sheets
sheets#> ['Sheet1', 'Sheet2', 'Sheet3']
I use the function read_excel of the Pandas library to read the Excel file. In this case, I use the argument sheet_name=sheets[0], equivalent to sheet_name=“Sheet1”.
data=pd.read_excel("data/df.xlsx",sheet_name=sheets[0])
data
Unnamed: 0 | Unnamed: 1 | Unnamed: 2 | Unnamed: 3 | |
---|---|---|---|---|
0 | nan | Título HOJA 1 | nan | nan |
1 | nan | nan | X | Y |
2 | A | nan | 2 | 10 |
3 | B | nan | 50 | nan |
4 | C | nan | nan | 25 |
5 | nan | nan | nan | nan |
6 | D | nan | 20 | 34 |
7 | E | nan | 200 | 23 |
Sometimes is useful to read the Excel file and define as the index a column of the Excel file. In this case, we want the column “Unnamed: 0”.
data=pd.read_excel("data/df.xlsx",sheet_name=sheets[0],index_col="Unnamed: 0")
data
Unnamed: 1 | Unnamed: 2 | Unnamed: 3 | |
---|---|---|---|
nan | Título HOJA 1 | nan | nan |
nan | nan | X | Y |
A | nan | 2 | 10 |
B | nan | 50 | nan |
C | nan | nan | 25 |
nan | nan | nan | nan |
D | nan | 20 | 34 |
E | nan | 200 | 23 |
1.7 Numpy modules
NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.
In machine learning models is useful to work with Numpy arrays. NumPy’s main object is the homogeneous multidimensional array. For example, we could define variables x and y as an array.
import numpy as np
=np.array([[1,2,3],[4,5,6]])
X
X#> array([[1, 2, 3],
#> [4, 5, 6]])
Or we can create a matrix for several variables.
=np.array([[2,4,6],[8,10,12]])
Y
Y#> array([[ 2, 4, 6],
#> [ 8, 10, 12]])
Sometimes is useful to simulate a missing value:
np.nan#> nan
1.8 Missing value management
Suppose we read the Excel we used in the section “Reading Excel and CSV files.” We want to work with the “Unnamed: 2” column, as a header we use the second row.
data=pd.read_excel("data/df.xlsx",sheet_name="Sheet1",index_col="Unnamed: 0",header=2)
data
Unnamed: 1 | X | Y | |
---|---|---|---|
A | nan | 2.000000 | 10.000000 |
B | nan | 50.000000 | nan |
C | nan | nan | 25.000000 |
nan | nan | nan | nan |
D | nan | 20.000000 | 34.000000 |
E | nan | 200.000000 | 23.000000 |
To get a better look at our data frame, we select the second and third columns, with the following method:
data=data.iloc[:,1:]
data
X | Y | |
---|---|---|
A | 2.000000 | 10.000000 |
B | 50.000000 | nan |
C | nan | 25.000000 |
nan | nan | nan |
D | 20.000000 | 34.000000 |
E | 200.000000 | 23.000000 |
A first look to detect missing values is using the following function:
sum()
data.isna().#> X 2
#> Y 2
#> dtype: int64
It tells us that both columns, “X” and “Y”, have two missing values.
We have many alternatives to manage them. First eliminating the row that has at least one missing value.
data.dropna()
X | Y | |
---|---|---|
A | 2.000000 | 10.000000 |
D | 20.000000 | 34.000000 |
E | 200.000000 | 23.000000 |
Another alternative is filling them with a value, for example the last value available in the data frame.
data.fillna(method='ffill')
X | Y | |
---|---|---|
A | 2.000000 | 10.000000 |
B | 50.000000 | 10.000000 |
C | 50.000000 | 25.000000 |
nan | 50.000000 | 25.000000 |
D | 20.000000 | 34.000000 |
E | 200.000000 | 23.000000 |
Or with a value such as zero:
data_clenan=data.replace(np.nan,0)
# which is equivalent to
#data.fillna(0)
data_clenan
X | Y | |
---|---|---|
A | 2.000000 | 10.000000 |
B | 50.000000 | 0.000000 |
C | 0.000000 | 25.000000 |
nan | 0.000000 | 0.000000 |
D | 20.000000 | 34.000000 |
E | 200.000000 | 23.000000 |
We still have a missing value in the index, after letter “C”. The next function
# This function skips the index elements of a data frame that are missing values, space or: ".",",",";",";","'",'""'.
# It returns a data frame without the ignored elements.
# Parameters:
# df: data frame. The object for which the method is called
#---- Do not change anything from here ----
def clean_na_index2(df):
skips=[".",",",";",";","'",'""'," ",np.nan]
con=[name_ind for name_ind in df.index if name_ind not in skips]
return df.loc[con, ]
#----- To here ------------
#Run the code so that Python can execute the function
df_index_clean=clean_na_index2(data_clean)
df_index_clean
X | Y | |
---|---|---|
A | 2.000000 | 10.000000 |
B | 50.000000 | 0.000000 |
C | 0.000000 | 25.000000 |
D | 20.000000 | 34.000000 |
E | 200.000000 | 23.000000 |
1.9 Merge, joint or concatenate data frames
Suppose we want to merge the object df_index_clean of the previous section, with the data frame in the “Sheet2” of the Excel file “df.xlsx”:
#,header=2
data=pd.read_excel("data/df.xlsx",sheet_name="Sheet2",index_col="Unnamed: 0")
data
W | Z | |
---|---|---|
A | 2 | 10 |
B | 50 | 30 |
C | 30 | 25 |
D | 20 | 34 |
E | 200 | 23 |
In this case, both data frames have the same index:
print(data.index)
#> Index(['A', 'B', 'C', 'D', 'E'], dtype='object')
print(df_index_clean.index)
#> Index(['A', 'B', 'C', 'D', 'E'], dtype='object')
Then we can use the function concat. The argument axis=1 is to concatenate the columns of both data frames in the columns.
pd.concat([df_index_clean,data],axis=1)
X | Y | W | Z | |
---|---|---|---|---|
A | 2.000000 | 10.000000 | 2 | 10 |
B | 50.000000 | 0.000000 | 50 | 30 |
C | 0.000000 | 25.000000 | 30 | 25 |
D | 20.000000 | 34.000000 | 20 | 34 |
E | 200.000000 | 23.000000 | 200 | 23 |
Otherwise it would concatenate in the index (rows)
pd.concat([df_index_clean,data])
X | Y | W | Z | |
---|---|---|---|---|
A | 2.000000 | 10.000000 | nan | nan |
B | 50.000000 | 0.000000 | nan | nan |
C | 0.000000 | 25.000000 | nan | nan |
D | 20.000000 | 34.000000 | nan | nan |
E | 200.000000 | 23.000000 | nan | nan |
A | nan | nan | 2.000000 | 10.000000 |
B | nan | nan | 50.000000 | 30.000000 |
C | nan | nan | 30.000000 | 25.000000 |
D | nan | nan | 20.000000 | 34.000000 |
E | nan | nan | 200.000000 | 23.000000 |
1.10 API´s (Application Programming Interface)
1.10.1 The “yfinance” library
It was designed to download market data from Yahoo! Finance. To see how to install it and more information: https://pypi.org/project/yfinance/
import yfinance as yf
= yf.Ticker("MSFT") msft
The following code shows a dictionary that contains information such as company address, business summary, etc.
msft.info
To download the ticker´s prices.
msft.history(period="1mo").head()
Open | High | Low | Close | Volume | Dividends | Stock Splits | |
---|---|---|---|---|---|---|---|
Date | |||||||
2023-05-01 00:00:00-04:00 | 306.300422 | 307.926871 | 304.484384 | 304.893494 | 21294100 | 0.000000 | 0.000000 |
2023-05-02 00:00:00-04:00 | 307.088685 | 308.505570 | 303.247077 | 304.743805 | 26404400 | 0.000000 | 0.000000 |
2023-05-03 00:00:00-04:00 | 305.951182 | 307.936831 | 303.426702 | 303.736023 | 22360800 | 0.000000 | 0.000000 |
2023-05-04 00:00:00-04:00 | 305.571981 | 307.088685 | 302.738180 | 304.743805 | 22519900 | 0.000000 | 0.000000 |
2023-05-05 00:00:00-04:00 | 305.053143 | 311.289510 | 303.606293 | 309.972382 | 28181200 | 0.000000 | 0.000000 |
As you can see in the method help (help(msft.history)), valid periods are: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
Also there are other parameters such as interval=‘1d’: intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
For some of the procedures we will apply in this book, we require the dates in a format, for example “%Y-%m-%d”. And the
import yfinance as yf
# This function download market data from Yahoo! Finance's
# It returns a data frame with a specific format date.
# Parameters:
# ticker: Yahoo finance ticker symbol
# per: valid periods are: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
# inter: intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
# price_type: Open', 'High', 'Low', 'Close', 'Volume', 'Dividends', 'Stock Splits
# format_date: index format date
# d_ini: initial date if the subset
# d_fin: final date of the subset
#---- Do not change anything from here ----
def my_yahoo(ticker,per,inter,price_type,format_date,d_ini,d_fin):
import pandas as pd
= yf.Ticker(ticker)
x = x.history(period=per,interval=inter)
hist =list(hist.index)
date=[date.strftime(format_date) for date in date]
hist_date=list(hist[price_type])
price={ticker:price}
hist=pd.DataFrame(hist,index=hist_date)
histreturn hist.loc[d_ini:d_fin]
#----- To here ------------
#Run the code cell so that Python can execute the function
format_date="%Y-%m-%d"
per="5y"
inter='1mo'
price_type="Close"
ticker="^MXX"
d_ini="2021-01-01"
d_fin="2022-12-01"
ipc=my_yahoo(ticker,per,inter,price_type,format_date,d_ini,d_fin)
ipc.head()
^MXX | |
---|---|
2021-01-01 | 42985.730469 |
2021-02-01 | 44592.910156 |
2021-03-01 | 47246.261719 |
2021-04-01 | 48009.718750 |
2021-05-01 | 50885.949219 |
1.10.2 Banxico API
To download historical information of the Mexican central bank (BANXICO). Such as interest rates, exchange rates and other macroeconomic information. To see how to install it and more information: https://pypi.org/project/sie-banxico/
We need to get a token in the following web page: https://www.banxico.org.mx/SieAPIRest/service/v1/token?locale=en
The token must look like this. And you have to store it in an object, for example:
token = "e3980208bf01ec653aba9aee3c2d6f70f6ae8b066d2545e379b9e0ef92e9de25"
In the same web page you could see the Series catalog, which are the variable ID:
from sie_banxico import SIEBanxico
# This function download information from BANXICO
# It returns a data frame with a specific format date
# Parameters:
# token: The token object
# my_series: Banxico´s Series ID´s
# my_series_name: The short name we want to assign to the Serie.
# d_in: initian date of the subset
# d_fin: final date of the subset
# format_date: index format date
#---- Do not change anything from here ----
def my_banxico_py(token,my_series,my_series_name,d_in,d_fin,format_date):
import pandas as pd
le=len(my_series)
ser=0
if(le==1):
ser=0
api = SIEBanxico(token = token, id_series = my_series[ser])
timeseries_range=api.get_timeseries_range(init_date=d_in,end_date=d_fin)
timeseries_range=timeseries_range['bmx']['series'][0]['datos']
data=pd.DataFrame(timeseries_range)
dates=[pd.Timestamp(date).strftime(format_date) for date in list(data["fecha"])]
data=pd.DataFrame({my_series_name[ser]:list(data["dato"])},index=dates)
else:
ser=0
api = SIEBanxico(token = token, id_series = my_series[ser])
timeseries_range=api.get_timeseries_range(init_date=d_in, end_date=d_fin)
timeseries_range=timeseries_range['bmx']['series'][0]['datos']
data=pd.DataFrame(timeseries_range)
dates=[pd.Timestamp(date).strftime(format_date) for date in list(data["fecha"])]
data=pd.DataFrame({my_series_name[ser]:list(data["dato"])},index=dates)
for ser in range(1,le):
api = SIEBanxico(token = token, id_series = my_series[ser])
timeseries_range=api.get_timeseries_range(init_date=d_in, end_date=d_fin)
timeseries_range=timeseries_range['bmx']['series'][0]['datos']
data2=pd.DataFrame(timeseries_range)
dates2=[pd.Timestamp(date).strftime(format_date) for date in list(data2["fecha"])]
data2=pd.DataFrame({my_series_name[ser]:list(data2["dato"])},index=dates2)
data=pd.concat([data,data2],axis=1)
ban_names=list(data.columns)
for col_i in range(data.shape[1]):
cel_num=[float(cel) for cel in data[ban_names[col_i]]]
data[ban_names[col_i]]=cel_num
return data
#----- To here ------------
For this example, we want to download the following Series:
SF17908: Exchange rate Pesos per US dollar
SF282: 28 days Mexican treasury bills
SP74660: Mexican inflation rate
SR16734: Global indicator of Mexican economic activity
#Run the code cell so that Python can execute the function
my_series=['SF17908' ,'SF282',"SP74660","SR16734"]
my_series_name=["TC","Cetes_28","Mex_inflation","igae"]
d_in='2021-01-01'
d_fin='2022-12-01'
format_date="%Y-%d-%m"
my_banxico_py(token,my_series,my_series_name,d_in,d_fin,format_date).head()
TC | Cetes_28 | Mex_inflation | igae | |
---|---|---|---|---|
2021-01-01 | 19.921500 | 4.220000 | 0.360000 | 105.545700 |
2021-02-01 | 20.309700 | 4.120000 | 0.390000 | 102.802900 |
2021-03-01 | 20.755500 | 4.050000 | 0.540000 | 111.518600 |
2021-04-01 | 20.015300 | 4.070000 | 0.370000 | 107.834900 |
2021-05-01 | 19.963100 | 4.060000 | 0.530000 | 111.176800 |
1.11 Plots or graphs
For example, we use the method plot to plot the APPLE historical price. First, we download the prices.
format_date="%Y-%m-%d"
per="5y"
inter='1mo'
price_type="Close"
ticker="AAPL"
d_ini="2023-01-01"
d_fin="2023-05-01"
apple=my_yahoo(ticker,per,inter,price_type,format_date,d_ini,d_fin)
Then we use the function plot;
apple.plot(title="APPLE close price", ylabel="Price in $",xlabel="Date");

1.12 Dates management
In this section we mange the data frame dates.
data=pd.read_excel("data/df.xlsx",sheet_name="Sheet3")
#,index_col="Unnamed: 0"
data
date | Sales | |
---|---|---|
0 | Ene 2021 | 5 |
1 | Feb 2021 | 7 |
2 | Mar 2021 | 60 |
3 | Abr 2021 | 20 |
4 | May 2021 | 21 |
For some analysis in machine learning, we require to have the index data as date format. The previous data frame index type is a string:
type(data.index[0])
#> <class 'int'>
Then, we use the following function:
# This function transforms the data frame index into a date format
# (Timestamp).
# It returns a data frame with the new date index
# Parameters:
# data: data frame with two columns, a date column and another one
# i_date: is the start date of the new index
# freq_i; frequency if the new index, "y" for year, "m" month, "d" day, "h" hour.
# col_name: name of the column in the data frame data that is not the date
# date_name= name of the column in the data that is the date
#---- Do not change anything from here ----
def index_date(data,i_date,freq_i,col_name,date_name):
dat=data.set_index(date_name)
ventas_s= pd.Series(
list(dat[col_name]), index=pd.date_range(i_date, periods=len(dat),
freq=freq_i), name=col_name)
return pd.DataFrame(ventas_s)
#----- To here ------------
#Run the code so that Python can execute the function
i_date="1-1-2020"
freq_i="m"
col_name="Sales"
date_name="date"
data_ind_date=index_date(data,i_date,freq_i,col_name,date_name)
data_ind_date
Now the index is in a “Timestamp” format. For the moment, let’s say that it is a date format.
Sales | |
---|---|
2020-01-31 00:00:00 | 5 |
2020-02-29 00:00:00 | 7 |
2020-03-31 00:00:00 | 60 |
2020-04-30 00:00:00 | 20 |
2020-05-31 00:00:00 | 21 |