r - How can I split a character string in a dataframe into multiple columns -
i'm working dataframe, 1 column of contains values numeric may contain non-numeric entries. split column multiple columns. 1 of new columns should contain numeric portion of original entry , column should contain non-numeric elements.
here sample data frame:
df <- data.frame(id=1:4,x=c('< 0.1','100','a 2.5', '200'))
here data frame like:
id x1 x2 1 < 0.1 2 100 3 2.5 4 200
on feature of data taking advantage of structure of character strings follows: non-numeric elements (if exist) precede numeric elements , 2 elements separated space.
i can use colsplit reshape package split column based on whitespace. problem replicates entry can't split 2 elements,
require(reshape) df <- transform(df, x=colsplit(x,split=" ", names("x1","x2"))) df id x1 x2 1 < 0.1 2 100 100 3 2.5 4 200 200
this not terribly problematic can post-processing remove numeric elements column "x1."
i can accomplish using strsplit inside function:
split.fn <- function(id){ new.val <- unlist(strsplit(as.character(df$x[df$id==id])," ")) if(length(new.val)==1){ return(data.frame(id=id,x1="na",x2=new.val)) }else{ return(data.frame(id=id,x1=new.val[1],x2=new.val[2])) } } data.frame(rbindlist(lapply(unique(df$id),split.fn))) id x1 x2 1 < 0.1 2 na 100 3 2.5 4 na 200
but seems cumbersome.
basically both options i've outlined here work. suspect there more elegant or direct way desired data frame.
you can use separate()
tidyr
tidyr::separate(df, x, c("x1", "x2"), " ", fill = "left") # id x1 x2 # 1 1 < 0.1 # 2 2 <na> 100 # 3 3 2.5 # 4 4 <na> 200
if absolutely need remove na
values, can do
tdy <- tidyr::separate(df, x, c("x1", "x2"), " ", fill = "left") tdy[is.na(tdy)] <- ""
and have
tdy # id x1 x2 # 1 1 < 0.1 # 2 2 100 # 3 3 2.5 # 4 4 200
Comments
Post a Comment