python - pandas - Why can't DataFrame.apply be used to set some columns Categorical -


i have pandas dataframe, , i'd efficiently turn multiple columns categorical columns. first thought use pandas.dataframe.apply convert relevant columns. using following example data:

import pandas pd  pdf = pd.dataframe(dict(name=       ('earl', 'eve', 'alan', 'randall', 'danielle'),                         age=        (    29,    17,     73,        31,         62),                         gender=     (   'm',   'f',    'm',       'm',        'f'),                         nationality=(  'us',  'uk',  'can',     'can',       'us'),                         height=     ( 182.9, 167.6,  175.3,     170.2,      172.8)),                    columns=('name', 'age', 'gender', 'nationality', 'height')) pdf = pdf.set_index('name') >>> print(pdf)            age gender nationality  height name earl       29      m            182.9 eve        17      f          uk   167.6 alan       73      m         can   175.3 randall    31      m         can   170.2 danielle   62      f            172.8 

you can see apply approach not working:

cat_list = {'gender', 'nationality'} set_cat_list = lambda x: x.astype('category') if x.name in cat_list else x dfa = pdf.apply(set_cat_list)  >>> print('applied subset: dtype={}'.format(dfa['gender'].dtype)) applied subset: dtype=object 

this not throw error, silently converts column categorical @ point. , check it's firing correctly, added probe:

in_cl = lambda x: x.name in cat_list set_cat_list_alert = lambda x: (set_cat_list(x),                                 sys.stdout.write('{}: {}\n'.format(x.name, in_cl(x))))[0] dfa = pdf.apply(set_cat_list_alert) >>> print('applied subset: dtype={}'.format(dfa['gender'].dtype)) age: false age: false gender: true nationality: true height: false applied subset: dtype=object 

evidently, fires off correctly, test see if approach can work @ all, tried converting all columns, , apparently works fine:

set_cat = lambda x: x.astype('category') dfb = pdf.apply(set_cat)  >>> print('applied whole frame: dtype={}'.format(dfb['gender'].dtype)) applied whole frame: dtype=category 

finally, tried using for loop duplicate final result, make sure mixed categorical / non-categorical columns can coexist this:

dfc = pdf.copy() cat in cat_list:     dfc[cat] = pdf[cat].astype('category')  >>> print('for loop: dtype={}'.format(dfc['gender'].dtype)) loop: dtype=category 

so question - why can't dataframe.apply() used set of these columns categorical? missing here?

this bug, indicated issue here, , fixed in upcoming 0.17.0 release, due first week of october.

you can install 0.17.0rc1 by:

conda install pandas -c pandas


Comments

Popular posts from this blog

1111. appearing after print sequence - php -

java - WARN : org.springframework.web.servlet.PageNotFound - No mapping found for HTTP request with URI [/board/] in DispatcherServlet with name 'appServlet' -

Ruby on Rails, ActiveRecord, Postgres, UTF-8 and ASCII-8BIT encodings -