r - How can I loop through multiple dataframes, for a list of strings, search for columns containing string and create multiple new files? -


i have 24 data files (bsls). each file contains fixed number of rows variable number of columns (sites). have clean list of 23 sites cannot exact match column names pertaining each site contains additional information.

i've read these files r using following code:

#list files dir , read, skipping rows until 'q num' temp <- list.files() # e.g. info-stuff-nameofbsl-otherstuff.csv  # read.xls , strip bsl name file , assign object name for(i in temp){     assign(unlist(strsplit(i, split = '-', fixed = t))[3],            read.xls(i, pattern = "q num")) }  #create list of dataframes (24 bsls) bsls <- filter(function(x) is(x, "data.frame"), mget(ls()))  #clean list of site names sites <- ("newyork","london","sydney","paris","manchester","angers","venice","bangkok","glasgow","boston","perth","canberra","lyons","washington","milan","cardiff","dublin","frankfurt","ottawa","toronto","el.salvador","taltal","caldera") 

example of first 3 lines of 1 of 24 bsls datasets

e.g. bsl1

qnum,   questiontext,   % unrelatedcol, newyork_other_info, london_some_other_info, venice_other_diff_info,  q17a,   question?,                 74%,              69%,                     81%,                  76%, q17b,   question?,         72%,              73%,                     77%,                  74%, 

the outcome need each of 23 sites have .csv file containing columns found within 24 data files (bsls).

my current attempt...

for(site in sites){                             #for each site     assign(site, data.frame())                  #create empty data frame add vectors     for(bsl in dfs){                            #for each dataset         if (grepl(site, colnames(bsl))){        #substring match            next                                 #go loop         }     assign(site$bsl, bsl[,grepl("site", colnames(bsl))]) #assign column dataframe     }  } 

the solution this...

e.g london.csv

qnum,   questiontext, bslname1_other_info,  bslname2_some_other_info, bsl5other_diff_info,  q17a,   question?,                 74%,              69%,                     81%,                  76%, q17b,   question?,         72%,              73%,                     77%,                  74%, 

there 23 files, 1 each site, containing columns pertaining site 24 input bsl files.

edit - worth stating each of bsls not called bsl1, bsl2... etc in fact unique strings e.g. unit,section,team... etc.

library(dplyr) library(stringi) library(tidyr)   bind_rows(bsls, .id = bsl) %>%   gather(variable, value,           matches(sites %>% paste(collapse = "|") ),           na.rm = true ) %>%   separate(variable, c("site", "new_variable",             sep = "_", = "merge") %>%   unite(final_variable, bsl, new_variable, sep = "_") %>%   spread(final_variable, value) %>%   group_by(site) %>%   do(write.csv(., paste("site", first(.$site), ".csv") ) ) 

Comments

Popular posts from this blog

html - Outlook 2010 Anchor (url/address/link) -

javascript - Why does running this loop 9 times take 100x longer than running it 8 times? -

Getting gateway time-out Rails app with Nginx + Puma running on Digital Ocean -