r - How can I loop through multiple dataframes, for a list of strings, search for columns containing string and create multiple new files? -

July 15, 2014

i have 24 data files (bsls). each file contains fixed number of rows variable number of columns (sites). have clean list of 23 sites cannot exact match column names pertaining each site contains additional information.

i've read these files r using following code:

#list files dir , read, skipping rows until 'q num' temp <- list.files() # e.g. info-stuff-nameofbsl-otherstuff.csv  # read.xls , strip bsl name file , assign object name for(i in temp){     assign(unlist(strsplit(i, split = '-', fixed = t))[3],            read.xls(i, pattern = "q num")) }  #create list of dataframes (24 bsls) bsls <- filter(function(x) is(x, "data.frame"), mget(ls()))  #clean list of site names sites <- ("newyork","london","sydney","paris","manchester","angers","venice","bangkok","glasgow","boston","perth","canberra","lyons","washington","milan","cardiff","dublin","frankfurt","ottawa","toronto","el.salvador","taltal","caldera")

example of first 3 lines of 1 of 24 bsls datasets

e.g. bsl1

qnum,   questiontext,   % unrelatedcol, newyork_other_info, london_some_other_info, venice_other_diff_info,  q17a,   question?,                 74%,              69%,                     81%,                  76%, q17b,   question?,         72%,              73%,                     77%,                  74%,

the outcome need each of 23 sites have .csv file containing columns found within 24 data files (bsls).

my current attempt...

for(site in sites){                             #for each site     assign(site, data.frame())                  #create empty data frame add vectors     for(bsl in dfs){                            #for each dataset         if (grepl(site, colnames(bsl))){        #substring match            next                                 #go loop         }     assign(site$bsl, bsl[,grepl("site", colnames(bsl))]) #assign column dataframe     }  }

the solution this...

e.g london.csv

qnum,   questiontext, bslname1_other_info,  bslname2_some_other_info, bsl5other_diff_info,  q17a,   question?,                 74%,              69%,                     81%,                  76%, q17b,   question?,         72%,              73%,                     77%,                  74%,

there 23 files, 1 each site, containing columns pertaining site 24 input bsl files.

edit - worth stating each of bsls not called bsl1, bsl2... etc in fact unique strings e.g. unit,section,team... etc.

library(dplyr) library(stringi) library(tidyr)   bind_rows(bsls, .id = bsl) %>%   gather(variable, value,           matches(sites %>% paste(collapse = "|") ),           na.rm = true ) %>%   separate(variable, c("site", "new_variable",             sep = "_", = "merge") %>%   unite(final_variable, bsl, new_variable, sep = "_") %>%   spread(final_variable, value) %>%   group_by(site) %>%   do(write.csv(., paste("site", first(.$site), ".csv") ) )

Search This Blog

TSQL

r - How can I loop through multiple dataframes, for a list of strings, search for columns containing string and create multiple new files? -

Comments

Post a Comment

Popular posts from this blog

java - WARN : org.springframework.web.servlet.PageNotFound - No mapping found for HTTP request with URI [/board/] in DispatcherServlet with name 'appServlet' -

android - How to create dynamically Fragment pager adapter -

1111. appearing after print sequence - php -