python - urllib.error.URLError: <urlopen error [Errno -2] Name or service not known> -
from urllib.request import urlopen bs4 import beautifulsoup import datetime import random import re random.seed(datetime.datetime.now()) def getlinks(articleurl): html = urlopen("http://en.wikipedia.org"+articleurl) bsobj = beautifulsoup(html) return bsobj.find("div", {"id":"bodycontent"}).findall("a",href = re.compile("^(/wiki/)((?!:).)*$")) getlinks('http://en.wikipedia.org')
os linux. above script spits out "urllib.error.urlerror: ". looked through number of attempts solve found on google, none of them fixed problem (attempted solutions include changing env variable , adding nameserver 8.8.8.8 resolv.conf file).
you should call getlinks()
valid url:
>>> getlinks('/wiki/main_page')
besides, in function, should call .read()
response content before passing beautifulsoup
:
>>> html = urlopen("http://en.wikipedia.org" + articleurl).read()
Comments
Post a Comment