Pandas Cheat Sheet
I will use this blog post to keep track of typical operations I need to do over pandas DataFrame, after realising I need to do them whenever I explore a DataFrames, but I keep forgetting them.
Columns
Remove columns from a DataFrame
df_rels_in_scope.drop(['ent1','ent2','label'], axis=1)
Rename columns names with a dictionary
df = df.rename(columns={'rel_type':'arg_ent_type'})
Add a new column by applying a function to other columns
def replace_arg(arg_type_str):
if arg_type_str is None:
return None
return re.sub(r'[0-9]+','',arg_type_str)
norm_arg = df_bookings_only.apply(lambda row: replace_arg(row.arg_type), axis=1)
df_bookings_only.insert(len(df_bookings_only.columns), "arg_type_norm", norm_arg)
Sort after a groupby count
df_entities[['entity_text','entity_type']].groupby('entity_type').count().sort_values(by='entity_text', ascending=False)
Select rows whose column value equals some value
df.loc[df['column_name'] == some_value]
Display
Number of rows to display
pd.set_option('display.max_rows', 1500)
pandas
cheat-sheet
reference-post
]