r - Getting a summary data frame for all the combinations of categories represented in two columns -
i working data frame corresponding example below:
set.seed(1) dta <- data.frame("cata" = rep(c("a","b","c"), 4), "catnum" = rep(1:2,6), "someval" = runif(12)) i build data frame have sum values combinations of categories derived cata , catnum categories derived each column separately. on primitive example above, first couple of combinations, can achieved use of simple code:
df_sums <- data.frame( "category" = c("total a", "total , 1", "total , 2"), "sum" = c(sum(dta$someval[dta$cata == 'a']), sum(dta$someval[dta$cata == 'a' & dta$catnum == 1]), sum(dta$someval[dta$cata == 'a' & dta$catnum == 2])) ) this produces , informative data frame of sums:
category sum 1 total 2.1801780 2 total , 1 1.2101839 3 total , 2 0.9699941 this solution grossly inefficient when applied data frame multiple categories. achieve following:
- cycle through categories, including categories derived each column separately both columns in same time
- achieve flexibility respect how function applied, instance may want apply
meaninstead ofsum - save total for string separate object edit when applying other function
sum.
i thinking of using dplyr, on lines:
require(dplyr) df_sums_experiment <- dta %>% group_by(cata, catnum) %>% summarise(totval = sum(someval)) but it's not clear me how apply multiple groupings simultaneously. stated, i'm interested in grouping each column separately , combination of both columns. create string column indicate combined , in order.
you use tidyr unite columns , gather data. use dplyr summarise:
library(dplyr) library(tidyr) dta %>% unite(measurevar, cata, catnum, remove=false) %>% gather(key, val, -someval) %>% group_by(val) %>% summarise(sum(someval)) val sum(someval) (chr) (dbl) 1 1 2.8198078 2 2 3.0778622 3 2.1801780 4 a_1 1.2101839 5 a_2 0.9699941 6 b 1.4405782 7 b_1 0.4076565 8 b_2 1.0329217 9 c 2.2769138 10 c_1 1.2019674 11 c_2 1.0749464
Comments
Post a Comment