Lee Hodg
Pandas

Pandas Snippets

Imputing missing values


 From the mean of a feature

Country Name
YEARGDP
ARUBA1965NULL
ARUBA19665.872478e+08

Say you have a dataframe for GDP by `Country Name` for each `year`, but some years are missing values. One way to deal with the missing values is to fill them in with the mean GDP for that country as follows:

df['GDP_filled'] = df.groupby('Country Name')['GDP'].transform(lambda x: x.fillna(x.mean()))

With forward fill


We can also use the ffill option from Pandas.

First we need to take care to sort the data by `year`, then we group by the `Country Name` so that the forward fill stays within each country
df.sort_values('year').groupby('Country Name')['GDP'].fillna(method='ffill')

With backward fill


Of course there is backward fill too:
df.sort_values('year').groupby('Country Name')['GDP'].fillna(method='bfill')

Leave a Comment