Load xml "rows" into R data table -
i have data shaped this:
<people> <person first="mary" last="jane" sex="f" /> <person first="susan" last="smith" sex="f" height="168" /> <person last="black" first="joseph" sex="m" /> <person first="jessica" last="jones" sex="f" /> </people>
i data frame looks this:
first last sex height 1 mary jane f na 2 susan smith f 168 3 joseph black m na 4 jessica jones f na
i've gotten far:
library(xml) xpeople <- xmlroot(xmlparse(xml)) lst <- xmlapply(xpeople, xmlattrs) names(lst) <- 1:length(lst)
but can't life of me figure out how list data frame. can list "square" (i.e. fill in gaps) , put data frame:
lst <- xmlapply(xpeople, function(node) { attrs = xmlattrs(node) if (!("height" %in% names(attrs))) { attrs[["height"]] <- na } attrs }) df = as.data.frame(lst)
but have following problems:
- the data frame transposed
- first , last factors, not chr
- height factor, not numeric
- the first , last names got swapped around joseph black (not big issue since data consistent, annoying nonetheless)
how can data frame in correct form?
txt <- '<people> <person first="mary" last="jane" sex="f" /> <person first="susan" last="smith" sex="f" height="168" /> <person last="black" first="joseph" sex="m" /> <person first="jessica" last="jones" sex="f" /> </people>' library(xml) # xmltreeparse library(data.table) # rbindlist(...) xml <- xmltreeparse(txt, astext=true, useinternalnodes = true) rbindlist(lapply(xml["//person"],function(x)as.list(xmlattrs(x))),fill=true) # first last sex height # 1: mary jane f na # 2: susan smith f 168 # 3: joseph black m na # 4: jessica jones f na
you need as.list(xmlattrs(...))
instead of xmlattrs(...)
because rbindlist(...)
wants each argument list, not vector.
Comments
Post a Comment