python - urllib.error.URLError: <urlopen error [Errno -2] Name or service not known> -
from urllib.request import urlopen bs4 import beautifulsoup import datetime import random import re random.seed(datetime.datetime.now()) def getlinks(articleurl): html = urlopen("http://en.wikipedia.org"+articleurl) bsobj = beautifulsoup(html) return bsobj.find("div", {"id":"bodycontent"}).findall("a",href = re.compile("^(/wiki/)((?!:).)*$")) getlinks('http://en.wikipedia.org') os linux. above script spits out "urllib.error.urlerror: ". looked through number of attempts solve found on google, none of them fixed problem (attempted solutions include changing env variable , adding nameserver 8.8.8.8 resolv.conf file).
you should call getlinks() valid url:
>>> getlinks('/wiki/main_page') besides, in function, should call .read() response content before passing beautifulsoup:
>>> html = urlopen("http://en.wikipedia.org" + articleurl).read()
Comments
Post a Comment