python - Numbering Items in Scrapy -
so have items.py following:
class scrapyitem(scrapy.item): source = scrapy.field() link = scrapy.field()
and json output is:
[{"source": "some source", "link":"www.somelink.com"}, {"source": "some source again", "link":"www.somelink.org"}]
is there way change output to:
[{"source1": "some source", "link1":"www.somelink.com"}, {"source2": "some source again", "link2":"www.somelink.org"}]
from docs, saw can manipulate item values, can same items themselves?
edit
here's new code i'm using output article_id item_field
article_id = [1] def parse_common(self, response): feed = feedparser.parse(response.body) entry_n, entry in enumerate(feed.entries, start=article_id[-1]): try: item = newsbyteitem() item['source'] = response.url item['title'] = lxml.html.fromstring(entry.title).text item['link'] = entry.link item['description'] = entry.description item['article_id'] = '%d' % entry_n article_id.append(entry_n) request = request( entry.link, callback=getattr(self, response.meta['method']), dont_filter=response.meta.get('dont_filter', false) ) request.meta['item'] = item request.meta['entry'] = entry yield request except exception e: print '%s: %s' % (type(e), e) print entry
the problem entry_n restarts whenever changes url. that's why list used.
from discussion
the purpose of identifier if item has data missing or includes data isn't needed, can find dictionary right away , refactor code accordingly.
with purpose in mind, i'd suggest generate uuids. same effect, less hassle:
# item definition class scrapyitem(scrapy.item): source = scrapy.field() link = scrapy.field() uuid = scrapy.field() # processing def parse_common(self, response): ... item['uuid'] = uuid.uuid4() ...
Comments
Post a Comment