r - Getting a summary data frame for all the combinations of categories represented in two columns -


i working data frame corresponding example below:

set.seed(1) dta <- data.frame("cata" = rep(c("a","b","c"), 4), "catnum" = rep(1:2,6),                   "someval" = runif(12)) 

i build data frame have sum values combinations of categories derived cata , catnum categories derived each column separately. on primitive example above, first couple of combinations, can achieved use of simple code:

df_sums <- data.frame(   "category" = c("total a",                  "total , 1",                  "total , 2"),   "sum" = c(sum(dta$someval[dta$cata == 'a']),             sum(dta$someval[dta$cata == 'a' & dta$catnum == 1]),             sum(dta$someval[dta$cata == 'a' & dta$catnum == 2])) ) 

this produces , informative data frame of sums:

           category       sum 1       total 2.1801780 2 total , 1 1.2101839 3 total , 2 0.9699941 

this solution grossly inefficient when applied data frame multiple categories. achieve following:

  1. cycle through categories, including categories derived each column separately both columns in same time
  2. achieve flexibility respect how function applied, instance may want apply mean instead of sum
  3. save total for string separate object edit when applying other function sum.

i thinking of using dplyr, on lines:

require(dplyr) df_sums_experiment <- dta %>%   group_by(cata, catnum) %>%   summarise(totval = sum(someval)) 

but it's not clear me how apply multiple groupings simultaneously. stated, i'm interested in grouping each column separately , combination of both columns. create string column indicate combined , in order.

you use tidyr unite columns , gather data. use dplyr summarise:

library(dplyr) library(tidyr) dta %>% unite(measurevar, cata, catnum, remove=false) %>%         gather(key, val, -someval)  %>%         group_by(val) %>%         summarise(sum(someval))       val sum(someval)    (chr)        (dbl) 1      1    2.8198078 2      2    3.0778622 3         2.1801780 4    a_1    1.2101839 5    a_2    0.9699941 6      b    1.4405782 7    b_1    0.4076565 8    b_2    1.0329217 9      c    2.2769138 10   c_1    1.2019674 11   c_2    1.0749464 

Comments

Popular posts from this blog

html - Outlook 2010 Anchor (url/address/link) -

javascript - Why does running this loop 9 times take 100x longer than running it 8 times? -

Getting gateway time-out Rails app with Nginx + Puma running on Digital Ocean -