Pandas Snippets - Lee Hodgkinson

Pandas

Pandas Snippets

leehodg | September 1, 2022

Imputing missing values

From the mean of a feature

Country Name	YEAR	GDP
ARUBA	1965	NULL
ARUBA	1966	5.872478e+08

Say you have a dataframe for GDP by `Country Name` for each `year`, but some years are missing values. One way to deal with the missing values is to fill them in with the mean GDP for that country as follows:

df['GDP_filled'] = df.groupby('Country Name')['GDP'].transform(lambda x: x.fillna(x.mean()))

With forward fill

We can also use the ffill option from Pandas.

First we need to take care to sort the data by `year`, then we group by the `Country Name` so that the forward fill stays within each country

df.sort_values('year').groupby('Country Name')['GDP'].fillna(method='ffill')

With backward fill

Of course there is backward fill too:

df.sort_values('year').groupby('Country Name')['GDP'].fillna(method='bfill')

Leave a Comment Cancel reply