Pandas Dataframes

Let’s see how to filter a dataframe

Code:

import pandas as pd

df = pd.read_excel('europe_cities_by_sunshine.xlsx')

df[df['Year'] > 3000]

Explanation:

Note that now the dataframe returns all rows with Year above 3000. The last row is a strange syntax at first, let’s look at the inside part first — it looks at the Year column, and filters for data above 3000. Then, think about it as putting this filtered data inside the dataframe ( df[] ) which will then return the rows above. For further context, the inside portion ( df[‘Year’] > 3000 ) returns a dataframe with True or False for which rows to return, so if the first row of the overall dataframe has Cyprus in it, it will have True, and if the second row has the UK, then it will have the False label. Then, we put this dataframe of True/False inside the overall dataframe to filter for which rows to return. A further explanation on this nuance is on the next tab.