Pandas Snippets
Imputing missing values
From the mean of a feature
Country Name | YEAR | GDP |
ARUBA | 1965 | NULL |
ARUBA | 1966 | 5.872478e+08 |
Say you have a dataframe for GDP by `Country Name` for each `year`, but some years are missing values. One way to deal with the missing values is to fill them in with the mean GDP for that country as follows:
df['GDP_filled'] = df.groupby('Country Name')['GDP'].transform(lambda x: x.fillna(x.mean()))
With forward fill
First we need to take care to sort the data by `year`, then we group by the `Country Name` so that the forward fill stays within each country
df.sort_values('year').groupby('Country Name')['GDP'].fillna(method='ffill')
With backward fill
Of course there is backward fill too:
df.sort_values('year').groupby('Country Name')['GDP'].fillna(method='bfill')