scala - How is the performance impact of select statements on Spark DataFrames? -
using many select statements or expressions on spark dataframes, wonder performance impact on subsequent transformations once triggered action.
given dataframe df 10 columns j.
how influence if use
ascolumn renaming on each column?df.select( df("a").as("1"), ..., df("j").as("10"))
what if select subset (e.g. 5 columns)
val df2 = df.select( df("a"), ..., df("e") )
b. how handles spark projection?
dfstill kept (asdf2projection)dfserve kind of reference? or insteaddf2created freshly ,dfdiscarded? (neglecting persist here)how influence of general
columnexpressions used inselect?are performance tests above cases available? , performance measurements in general somewhere available? if not, how measure performance best?
Comments
Post a Comment