Load xml "rows" into R data table -


i have data shaped this:

<people>   <person first="mary" last="jane" sex="f" />   <person first="susan" last="smith" sex="f" height="168" />   <person last="black" first="joseph" sex="m" />   <person first="jessica" last="jones" sex="f" /> </people> 

i data frame looks this:

    first  last sex height 1    mary  jane   f     na 2   susan smith   f    168 3  joseph black   m     na 4 jessica jones   f     na 

i've gotten far:

library(xml) xpeople <- xmlroot(xmlparse(xml)) lst <- xmlapply(xpeople, xmlattrs) names(lst) <- 1:length(lst) 

but can't life of me figure out how list data frame. can list "square" (i.e. fill in gaps) , put data frame:

lst <- xmlapply(xpeople, function(node) {   attrs = xmlattrs(node)   if (!("height" %in% names(attrs))) {     attrs[["height"]] <- na   }   attrs }) df = as.data.frame(lst) 

but have following problems:

  1. the data frame transposed
  2. first , last factors, not chr
  3. height factor, not numeric
  4. the first , last names got swapped around joseph black (not big issue since data consistent, annoying nonetheless)

how can data frame in correct form?

txt <- '<people>           <person first="mary" last="jane" sex="f" />           <person first="susan" last="smith" sex="f" height="168" />           <person last="black" first="joseph" sex="m" />           <person first="jessica" last="jones" sex="f" />         </people>' library(xml)         # xmltreeparse library(data.table)  # rbindlist(...) xml <- xmltreeparse(txt, astext=true, useinternalnodes = true) rbindlist(lapply(xml["//person"],function(x)as.list(xmlattrs(x))),fill=true) #      first  last sex height # 1:    mary  jane   f     na # 2:   susan smith   f    168 # 3:  joseph black   m     na # 4: jessica jones   f     na 

you need as.list(xmlattrs(...)) instead of xmlattrs(...) because rbindlist(...) wants each argument list, not vector.


Comments

Popular posts from this blog

html - Outlook 2010 Anchor (url/address/link) -

javascript - Why does running this loop 9 times take 100x longer than running it 8 times? -

Getting gateway time-out Rails app with Nginx + Puma running on Digital Ocean -