r - Getting a summary data frame for all the combinations of categories represented in two columns -
i working data frame corresponding example below:
set.seed(1) dta <- data.frame("cata" = rep(c("a","b","c"), 4), "catnum" = rep(1:2,6), "someval" = runif(12))
i build data frame have sum values combinations of categories derived cata
, catnum
categories derived each column separately. on primitive example above, first couple of combinations, can achieved use of simple code:
df_sums <- data.frame( "category" = c("total a", "total , 1", "total , 2"), "sum" = c(sum(dta$someval[dta$cata == 'a']), sum(dta$someval[dta$cata == 'a' & dta$catnum == 1]), sum(dta$someval[dta$cata == 'a' & dta$catnum == 2])) )
this produces , informative data frame of sums:
category sum 1 total 2.1801780 2 total , 1 1.2101839 3 total , 2 0.9699941
this solution grossly inefficient when applied data frame multiple categories. achieve following:
- cycle through categories, including categories derived each column separately both columns in same time
- achieve flexibility respect how function applied, instance may want apply
mean
instead ofsum
- save total for string separate object edit when applying other function
sum
.
i thinking of using dplyr
, on lines:
require(dplyr) df_sums_experiment <- dta %>% group_by(cata, catnum) %>% summarise(totval = sum(someval))
but it's not clear me how apply multiple groupings simultaneously. stated, i'm interested in grouping each column separately , combination of both columns. create string column indicate combined , in order.
you use tidyr
unite
columns , gather
data. use dplyr
summarise:
library(dplyr) library(tidyr) dta %>% unite(measurevar, cata, catnum, remove=false) %>% gather(key, val, -someval) %>% group_by(val) %>% summarise(sum(someval)) val sum(someval) (chr) (dbl) 1 1 2.8198078 2 2 3.0778622 3 2.1801780 4 a_1 1.2101839 5 a_2 0.9699941 6 b 1.4405782 7 b_1 0.4076565 8 b_2 1.0329217 9 c 2.2769138 10 c_1 1.2019674 11 c_2 1.0749464
Comments
Post a Comment