python - Beautiful Soup is Missing Tables from Wikipedia -


i trying la liga league table wikipedia, can't seem use find_all table i'm trying scrape. moreover, exact same code wrote scrapes epl data wikipedia fine...

the full html here: view-source:https://en.wikipedia.org/wiki/2015%e2%80%9316_la_liga

the part in question here:

<h2><span class="mw-headline" id="league_table">league table</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/w/index.php?title=2015%e2%80%9316_la_liga&amp;action=edit&amp;section=6" title="edit section: league table">edit</a><span class="mw-editsection-bracket">]</span></span></h2>  <h3><span class="mw-headline" id="standings">standings</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/w/index.php?title=2015%e2%80%9316_la_liga&amp;action=edit&amp;section=7" title="edit section: standings">edit</a><span class="mw-editsection-bracket">]</span></span></h3>  <table class="wikitable" style="text-align:center;">    <tr>      <th scope="col" width="28"><abbr title="position">pos</abbr>      </th>      <th scope="col" width="190">team        <div class="plainlinks hlist navbar mini" style="float:right">          <ul>            <li class="nv-view"><a href="/wiki/template:2015%e2%80%9316_la_liga_table" title="template:2015–16 la liga table"><span title="view template">v</span></a>            </li>            <li class="nv-talk"><a href="/wiki/template_talk:2015%e2%80%9316_la_liga_table" title="template talk:2015–16 la liga table"><span title="discuss template">t</span></a>            </li>            <li class="nv-edit"><a class="external text" href="//en.wikipedia.org/w/index.php?title=template:2015%e2%80%9316_la_liga_table&amp;action=edit"><span title="edit template">e</span></a>            </li>          </ul>        </div>      </th>

this how request page , cleansing of code before try find of tables:

soup = beautifulsoup(requests.get("https://en.wikipedia.org/wiki/2015-16_la_liga").text, "html.parser") superscript in soup.find_all("sup"):     superscript.decompose() print len(soup.find_all("table", attrs={"class": "wikitable"})) 

however getting length of 2 when looking @ page html, should getting @ least 14 tables attributes...

i have no idea go here, appreciated

--edit--

input/output output of soup shows wikitable still there...

everything work fine.

pyquery version

from pyquery import pyquery  pq = pyquery(url="https://en.wikipedia.org/wiki/2015-16_la_liga") all_tables = pq(".wikitable") print len(all_tables) 

beautifulsoup version

__author__ = "leonard richardson (leonardr@segfault.org)" __version__ = "4.3.2" __copyright__ = "copyright (c) 2004-2013 leonard richardson" __license__ = "mit"  bs4 import beautifulsoup import requests    soup = beautifulsoup(requests.get("https://en.wikipedia.org/wiki/2015-16_la_liga").text, "html.parser") superscript in soup.find_all("sup"):     superscript.decompose() print len(soup.find_all("table", attrs={"class": "wikitable"})) 

return 13 both version

maybe should install 4.3.2 version of bs or use pyquery?


Comments

Popular posts from this blog

1111. appearing after print sequence - php -

java - WARN : org.springframework.web.servlet.PageNotFound - No mapping found for HTTP request with URI [/board/] in DispatcherServlet with name 'appServlet' -

Ruby on Rails, ActiveRecord, Postgres, UTF-8 and ASCII-8BIT encodings -