python - pandas - Why can't DataFrame.apply be used to set some columns Categorical -
i have pandas dataframe, , i'd efficiently turn multiple columns categorical columns. first thought use pandas.dataframe.apply convert relevant columns. using following example data:
import pandas pd pdf = pd.dataframe(dict(name= ('earl', 'eve', 'alan', 'randall', 'danielle'), age= ( 29, 17, 73, 31, 62), gender= ( 'm', 'f', 'm', 'm', 'f'), nationality=( 'us', 'uk', 'can', 'can', 'us'), height= ( 182.9, 167.6, 175.3, 170.2, 172.8)), columns=('name', 'age', 'gender', 'nationality', 'height')) pdf = pdf.set_index('name') >>> print(pdf) age gender nationality height name earl 29 m 182.9 eve 17 f uk 167.6 alan 73 m can 175.3 randall 31 m can 170.2 danielle 62 f 172.8 you can see apply approach not working:
cat_list = {'gender', 'nationality'} set_cat_list = lambda x: x.astype('category') if x.name in cat_list else x dfa = pdf.apply(set_cat_list) >>> print('applied subset: dtype={}'.format(dfa['gender'].dtype)) applied subset: dtype=object this not throw error, silently converts column categorical @ point. , check it's firing correctly, added probe:
in_cl = lambda x: x.name in cat_list set_cat_list_alert = lambda x: (set_cat_list(x), sys.stdout.write('{}: {}\n'.format(x.name, in_cl(x))))[0] dfa = pdf.apply(set_cat_list_alert) >>> print('applied subset: dtype={}'.format(dfa['gender'].dtype)) age: false age: false gender: true nationality: true height: false applied subset: dtype=object evidently, fires off correctly, test see if approach can work @ all, tried converting all columns, , apparently works fine:
set_cat = lambda x: x.astype('category') dfb = pdf.apply(set_cat) >>> print('applied whole frame: dtype={}'.format(dfb['gender'].dtype)) applied whole frame: dtype=category finally, tried using for loop duplicate final result, make sure mixed categorical / non-categorical columns can coexist this:
dfc = pdf.copy() cat in cat_list: dfc[cat] = pdf[cat].astype('category') >>> print('for loop: dtype={}'.format(dfc['gender'].dtype)) loop: dtype=category so question - why can't dataframe.apply() used set of these columns categorical? missing here?
this bug, indicated issue here, , fixed in upcoming 0.17.0 release, due first week of october.
you can install 0.17.0rc1 by:
conda install pandas -c pandas
Comments
Post a Comment