当前位置: 代码迷 >> 综合 >> scrapy错误记录:Missing scheme in request url: h
  详细解决方案

scrapy错误记录:Missing scheme in request url: h

热度:88   发布时间:2023-12-12 07:57:05.0

写scrapy爬虫框架时,运行出现错误:Missing scheme in request url: h

spider.py代码如下:

注意查看start_urls,里面存放爬虫框架开始时的链接,该链接必须以列表形式存放

不能像我一样以字符串形式存放

# -*- coding: utf-8 -*-
import scrapy
from scrapy import Selector
from miao.items import MiaoItem
class MmiaoSpider(scrapy.Spider):name = 'mmiao'offset = 0allowed_domains = ["tencent.com"]url = 'http://hr.tencent.com/position.php?&start='start_urls = ('http://hr.tencent.com/position.php?&start=' + str(offset))addurl = 'https://hr.tencent.com/'def parse(self, response):for each in response.xpath("//tr[@class='even']|//tr[@class='odd']"):item = MiaoItem()item['positionname'] = each.xpath('./td[1]/a/text()').extract()[0]item['positionlink'] = self.addurl+each.xpath('./td[1]/a/@href').extract()[0]try:item['positiontype'] = each.xpath('./td[2]/text()').extract()[0]except:passitem['peoplenum']  = each.xpath('./td[3]/text()').extract()[0]item['worklocation'] = each.xpath("./td[4]/text()").extract()[0]# 发布时间item['publishtime'] = each.xpath("./td[5]/text()").extract()[0]yield itemif self.offset<1680:self.offset+=10yield  scrapy.Request(self.url+str(self.offset),callback=self.parse)

修改代码如下

# -*- coding: utf-8 -*-
import scrapy
from scrapy import Selector
from miao.items import MiaoItem
class MmiaoSpider(scrapy.Spider):name = 'mmiao'offset = 0allowed_domains = ["tencent.com"]url = 'http://hr.tencent.com/position.php?&start='start_urls = ['http://hr.tencent.com/position.php?&start=' + str(offset)]addurl = 'https://hr.tencent.com/'def parse(self, response):for each in response.xpath("//tr[@class='even']|//tr[@class='odd']"):item = MiaoItem()item['positionname'] = each.xpath('./td[1]/a/text()').extract()[0]item['positionlink'] = self.addurl+each.xpath('./td[1]/a/@href').extract()[0]try:item['positiontype'] = each.xpath('./td[2]/text()').extract()[0]except:passitem['peoplenum']  = each.xpath('./td[3]/text()').extract()[0]item['worklocation'] = each.xpath("./td[4]/text()").extract()[0]# 发布时间item['publishtime'] = each.xpath("./td[5]/text()").extract()[0]yield itemif self.offset<1680:self.offset+=10yield  scrapy.Request(self.url+str(self.offset),callback=self.parse)

大功告成 OK

本次博客记录到此结束

  相关解决方案