python - Numbering Items in Scrapy -


so have items.py following:

class scrapyitem(scrapy.item):     source = scrapy.field()     link = scrapy.field() 

and json output is:

[{"source": "some source", "link":"www.somelink.com"},  {"source": "some source again", "link":"www.somelink.org"}] 

is there way change output to:

[{"source1": "some source", "link1":"www.somelink.com"},  {"source2": "some source again", "link2":"www.somelink.org"}] 

from docs, saw can manipulate item values, can same items themselves?

edit

here's new code i'm using output article_id item_field

article_id = [1] def parse_common(self, response):     feed = feedparser.parse(response.body)     entry_n, entry in enumerate(feed.entries, start=article_id[-1]):         try:             item = newsbyteitem()             item['source'] = response.url             item['title'] = lxml.html.fromstring(entry.title).text             item['link'] = entry.link             item['description'] = entry.description             item['article_id'] = '%d' % entry_n             article_id.append(entry_n)             request = request(                 entry.link,                 callback=getattr(self, response.meta['method']),                 dont_filter=response.meta.get('dont_filter', false)             )              request.meta['item'] = item             request.meta['entry'] = entry              yield request         except exception e:             print '%s: %s' % (type(e), e)             print entry 

the problem entry_n restarts whenever changes url. that's why list used.

from discussion

the purpose of identifier if item has data missing or includes data isn't needed, can find dictionary right away , refactor code accordingly.

with purpose in mind, i'd suggest generate uuids. same effect, less hassle:

# item definition class scrapyitem(scrapy.item):     source = scrapy.field()     link = scrapy.field()     uuid = scrapy.field() # processing def parse_common(self, response):     ...     item['uuid'] = uuid.uuid4()     ... 

Comments

Popular posts from this blog

html - Outlook 2010 Anchor (url/address/link) -

javascript - Why does running this loop 9 times take 100x longer than running it 8 times? -

Getting gateway time-out Rails app with Nginx + Puma running on Digital Ocean -