r - How can I loop through multiple dataframes, for a list of strings, search for columns containing string and create multiple new files? -
i have 24 data files (bsls
). each file contains fixed number of rows variable number of columns (sites
). have clean list of 23 sites
cannot exact match column names pertaining each site contains additional information.
i've read these files r
using following code:
#list files dir , read, skipping rows until 'q num' temp <- list.files() # e.g. info-stuff-nameofbsl-otherstuff.csv # read.xls , strip bsl name file , assign object name for(i in temp){ assign(unlist(strsplit(i, split = '-', fixed = t))[3], read.xls(i, pattern = "q num")) } #create list of dataframes (24 bsls) bsls <- filter(function(x) is(x, "data.frame"), mget(ls())) #clean list of site names sites <- ("newyork","london","sydney","paris","manchester","angers","venice","bangkok","glasgow","boston","perth","canberra","lyons","washington","milan","cardiff","dublin","frankfurt","ottawa","toronto","el.salvador","taltal","caldera")
example of first 3 lines of 1 of 24 bsls
datasets
e.g. bsl1
qnum, questiontext, % unrelatedcol, newyork_other_info, london_some_other_info, venice_other_diff_info, q17a, question?, 74%, 69%, 81%, 76%, q17b, question?, 72%, 73%, 77%, 74%,
the outcome need each of 23 sites
have .csv
file containing columns found within 24 data files (bsls
).
my current attempt...
for(site in sites){ #for each site assign(site, data.frame()) #create empty data frame add vectors for(bsl in dfs){ #for each dataset if (grepl(site, colnames(bsl))){ #substring match next #go loop } assign(site$bsl, bsl[,grepl("site", colnames(bsl))]) #assign column dataframe } }
the solution this...
e.g london.csv
qnum, questiontext, bslname1_other_info, bslname2_some_other_info, bsl5other_diff_info, q17a, question?, 74%, 69%, 81%, 76%, q17b, question?, 72%, 73%, 77%, 74%,
there 23 files, 1 each site, containing columns pertaining site 24 input bsl
files.
edit - worth stating each of bsls
not called bsl1
, bsl2
... etc in fact unique strings e.g. unit
,section
,team
... etc.
library(dplyr) library(stringi) library(tidyr) bind_rows(bsls, .id = bsl) %>% gather(variable, value, matches(sites %>% paste(collapse = "|") ), na.rm = true ) %>% separate(variable, c("site", "new_variable", sep = "_", = "merge") %>% unite(final_variable, bsl, new_variable, sep = "_") %>% spread(final_variable, value) %>% group_by(site) %>% do(write.csv(., paste("site", first(.$site), ".csv") ) )
Comments
Post a Comment