html - Cannot find the text inside `<span>` -
i have html below:
<div class="info"> <h5> <a href="/aaa/">aaa </a> </h5> <span class="date"> 8:27am, sep 30</span> </div>
i'm using ruby , want text "8:27am, sep 30"
inside <span class="date">
. cannot find via command below.
find('div.info span.date').text
could please tell me why doesn't work? if find text inside h5
following command, can "aaa"
correctly.
find('div.info h5').text
full ruby code
then(/^you should see (\d+) latest items$/) |arg1| within("div.top-feature-list") # validate images of items exist, print report expect(all("img").size.to_s).to eq(arg1) puts "the number of items on current site " + (all("img").size.to_s) # list of items' details (image, headline, introduction, identifier, url) $i = 1 while $i <= arg1.to_i puts "item no." + $i.to_s puts " - image: " + find('ul.category-index li.item-' + $i.to_s + ' img')[:src].to_s puts " - headline: " + find('ul.category-index li.item-' + $i.to_s + ' div.info h5').text puts " - introduction: " + find('ul.category-index li.item-' + $i.to_s + ' div.summary').text puts " - url: " + find('ul.category-index li.item-' + $i.to_s + ' div.info h5 a')[:href].to_s puts " - created date " + find('ul.category-index li.item-' + $i.to_s + ' div.info span.date').text puts " - identifier: " + find('ul.category-index li.item-' + $i.to_s + ' div.img a.section-name').text puts " - subsection: " + find('ul.category-index li.item-' + $i.to_s + ' div.img a.section-name')[:href].to_s $i +=1 end end end
more html
<div class="top-feature-list"> <ul class="category-index"> <li class="group"> <ul> <li class="item-1 left "> <a name="item-1"></a> <div class="img"> <a href="/health-lifestyle/item1.html"> <img alt="how to" src="//image_url"> </a> <a class="section-name test" href="/health-lifestyle/"> lifestyle </a> </div> <div class="info"> <h5> <a href="/health-lifestyle/item1.html"> how </a> </h5> <span class="date"> 10:20am, sep 30</span> </div> <div class="summary"> <p> summary text</p> </div> </li> ....
env.rb
require 'parallel_tests' require 'capybara/cucumber' require 'capybara/poltergeist' require 'rspec'
parsing html super easy in ruby. need require 2 gems in program:
require 'open-uri' require 'nokogiri' # set page going scan. page = nokogiri::html(open("http://google.com/")) # (updated reflect date class provided in question) # extract specific elements via css selector. # first selects has span tag, # narrows down class of ".date" # use .strip remove whitespace html page.css('span').css('.date').text.strip! # => outputs "8:27am, sep 30"
if want more information on parsing html ruby, need googling , reading it. 1 great resource started here.
Comments
Post a Comment