Pandas Dataframes
Let’s see how to filter a dataframe
Code:
import pandas as pd
df = pd.read_excel('europe_cities_by_sunshine.xlsx')
df[df['Year'] > 3000]
Explanation:
Note that now the dataframe returns all rows with Year above 3000. The last row is a strange syntax at first, let’s look at the inside part first — it looks at the Year column, and filters for data above 3000. Then, think about it as putting this filtered data inside the dataframe ( df[] ) which will then return the rows above. For further context, the inside portion ( df[‘Year’] > 3000 ) returns a dataframe with True or False for which rows to return, so if the first row of the overall dataframe has Cyprus in it, it will have True, and if the second row has the UK, then it will have the False label. Then, we put this dataframe of True/False inside the overall dataframe to filter for which rows to return. A further explanation on this nuance is on the next tab.