r - Getting a summary data frame for all the combinations of categories represented in two columns -

June 15, 2010

i working data frame corresponding example below:

set.seed(1) dta <- data.frame("cata" = rep(c("a","b","c"), 4), "catnum" = rep(1:2,6),                   "someval" = runif(12))

i build data frame have sum values combinations of categories derived cata , catnum categories derived each column separately. on primitive example above, first couple of combinations, can achieved use of simple code:

df_sums <- data.frame(   "category" = c("total a",                  "total , 1",                  "total , 2"),   "sum" = c(sum(dta$someval[dta$cata == 'a']),             sum(dta$someval[dta$cata == 'a' & dta$catnum == 1]),             sum(dta$someval[dta$cata == 'a' & dta$catnum == 2])) )

this produces , informative data frame of sums:

           category       sum 1       total 2.1801780 2 total , 1 1.2101839 3 total , 2 0.9699941

this solution grossly inefficient when applied data frame multiple categories. achieve following:

cycle through categories, including categories derived each column separately both columns in same time
achieve flexibility respect how function applied, instance may want apply mean instead of sum
save total for string separate object edit when applying other function sum.

i thinking of using dplyr, on lines:

require(dplyr) df_sums_experiment <- dta %>%   group_by(cata, catnum) %>%   summarise(totval = sum(someval))

but it's not clear me how apply multiple groupings simultaneously. stated, i'm interested in grouping each column separately , combination of both columns. create string column indicate combined , in order.

you use tidyr unite columns , gather data. use dplyr summarise:

library(dplyr) library(tidyr) dta %>% unite(measurevar, cata, catnum, remove=false) %>%         gather(key, val, -someval)  %>%         group_by(val) %>%         summarise(sum(someval))       val sum(someval)    (chr)        (dbl) 1      1    2.8198078 2      2    3.0778622 3         2.1801780 4    a_1    1.2101839 5    a_2    0.9699941 6      b    1.4405782 7    b_1    0.4076565 8    b_2    1.0329217 9      c    2.2769138 10   c_1    1.2019674 11   c_2    1.0749464

Search This Blog

TSQL

r - Getting a summary data frame for all the combinations of categories represented in two columns -

Comments

Post a Comment

Popular posts from this blog

java - WARN : org.springframework.web.servlet.PageNotFound - No mapping found for HTTP request with URI [/board/] in DispatcherServlet with name 'appServlet' -

android - How to create dynamically Fragment pager adapter -

1111. appearing after print sequence - php -