python - Numbering Items in Scrapy -

January 15, 2014

so have items.py following:

class scrapyitem(scrapy.item):     source = scrapy.field()     link = scrapy.field()

and json output is:

[{"source": "some source", "link":"www.somelink.com"},  {"source": "some source again", "link":"www.somelink.org"}]

is there way change output to:

[{"source1": "some source", "link1":"www.somelink.com"},  {"source2": "some source again", "link2":"www.somelink.org"}]

from docs, saw can manipulate item values, can same items themselves?

edit

here's new code i'm using output article_id item_field

article_id = [1] def parse_common(self, response):     feed = feedparser.parse(response.body)     entry_n, entry in enumerate(feed.entries, start=article_id[-1]):         try:             item = newsbyteitem()             item['source'] = response.url             item['title'] = lxml.html.fromstring(entry.title).text             item['link'] = entry.link             item['description'] = entry.description             item['article_id'] = '%d' % entry_n             article_id.append(entry_n)             request = request(                 entry.link,                 callback=getattr(self, response.meta['method']),                 dont_filter=response.meta.get('dont_filter', false)             )              request.meta['item'] = item             request.meta['entry'] = entry              yield request         except exception e:             print '%s: %s' % (type(e), e)             print entry

the problem entry_n restarts whenever changes url. that's why list used.

from discussion

the purpose of identifier if item has data missing or includes data isn't needed, can find dictionary right away , refactor code accordingly.

with purpose in mind, i'd suggest generate uuids. same effect, less hassle:

# item definition class scrapyitem(scrapy.item):     source = scrapy.field()     link = scrapy.field()     uuid = scrapy.field() # processing def parse_common(self, response):     ...     item['uuid'] = uuid.uuid4()     ...

Search This Blog

TSQL

python - Numbering Items in Scrapy -

Comments

Post a Comment

Popular posts from this blog

java - WARN : org.springframework.web.servlet.PageNotFound - No mapping found for HTTP request with URI [/board/] in DispatcherServlet with name 'appServlet' -

android - How to create dynamically Fragment pager adapter -

1111. appearing after print sequence - php -