当前位置: 代码迷 >> 综合 >> python 爬取网易 buff 饰品数据及 steam 饰品市场数据 达到折上折
  详细解决方案

python 爬取网易 buff 饰品数据及 steam 饰品市场数据 达到折上折

热度:16   发布时间:2024-02-24 14:06:03.0

这里写自定义目录标题

  • 前言
  • 环境
  • 开始前
  • 第一步:爬取BUFF的[饰品名字]和[BUFF价格]
    • 1.获取cookie和header
    • 2.访问buff返回html
    • 3.re正则匹配得到[饰品名字]和[BUFF价格]
  • 第二步:爬取steam的[steam价格]和[steam24小时售出数量]
    • 1.访问steam返回html
    • 2.re正则匹配得到[steam价格]和[steam24小时售出数量]
  • 第三步:对获得的数据进行处理
    • 1.通过[饰品名字]获得[steam24小时售出数量]
    • 2.比较[steam24小时售出数量]判断删除该组还是爬取[steam价格]
    • 3.进行删除
    • 4.保存
  • 总结

前言

最近由于steam政策改变,steam礼品卡折上折难搞了,我一直买的那家tb店50$要270¥,在接近8折的条件下还需要提供账号密码代充,安全性有待考量,所以想着用py爬虫爬buff数据和steam数据进行处理,最后得到买卖饰品的折值,以达到等同于礼品卡的效果。
在学习Charles-D的文章后

PS.本文例子为dota2,buff上的其余饰品同理

环境

import requests
import re
import pandas as pd
import time

根据我所用到的引用模块,需要的库为
requests库,用于获取buff及steam的html,安装教程:
re库,用于正则匹配获取所需数据,为内置库。
pandas库,用于保存最终结果,安装教程:
time库,用于延时(防止被检测请求过多,得到html为null)、记录运行时间,为内置库。

开始前

环境配置完毕后让我们理一下逻辑,最终得到的结果应该包含[饰品名字]、[BUFF价格]、[steam价格]、[steam24小时售出数量]、[折率]。
那么:
第一步——爬取BUFF的[饰品名字]和[BUFF价格]。
第二步——爬取steam的[steam价格]和[steam24小时售出数量]。
第三步——对获得的数据进行处理。

第一步:爬取BUFF的[饰品名字]和[BUFF价格]

爬取BUFF数据遇到的第一个问题是登陆
可使用登录后的cookie进行访问。
详细参考

1.获取cookie和header

访问https://buff.163.com/登陆BUFF后按F12打开开发者工具,选中网络+标头,刷新页面,找到CookieUser-Agent

在这里插入图片描述

    # 表头headers = {
    'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Mobile Safari/537.36 Edg/85.0.564.63'}# BUFF cookiecookie_str = r'Device-Id=yFZJ64QHkCtznv0xgxqY; _ga=GA1.2.1833906180.1599195822; P_INFO=18581573728|1601021166|1|netease_buff|00&99|null&null&null#jil&220100#10#0|&0||18581573728; remember_me=U1093767863|vtjnXD4iEtuLVHis1vNpStAd0qoV56Oo; Locale-Supported=zh-Hans; _gid=GA1.2.1530976571.1601513433; game=csgo; session=1-k2SvP24G4lp7mVi7on-6KWL_AgR3y4wyEphsI_QXDFEf2046758383; _gat_gtag_UA_109989484_1=1; csrf_token=ImU1OWQwN2M3YmM4NTBhY2RhNTljZDA3OTY3NDZkN2Y2NjI5ZTIzMTki.ElcQxQ.wgB--s7F06wV64qbnKXHQjX9I_k'cookies = {
    }for line in cookie_str.split(';'):key, value = line.split('=', 1)cookies[key] = value

2.访问buff返回html

在BUFF中输入筛选价格可以帮我们过滤一部分数据,我这里选的35~200。

在这里插入图片描述

访问https://buff.163.com/api/market/goods?game=dota2&page_num=1&min_price=35&max_price=200

"items": [{"appid": 570, "bookmarked": false, "buy_max_price": "131", "buy_num": 45, "can_search_by_tournament": false, "description": null, "game": "dota2", "goods_info": {"icon_url": "https://g.fp.ps.netease.com/market/file/5a0e956d6f049424e570876aRCofBmRW", "info": {"tags": {"hero": {"category": "hero", "internal_name": "npc_dota_hero_phantom_assassin", "localized_name": "\u5e7b\u5f71\u523a\u5ba2"}, "rarity": {"category": "rarity", "internal_name": "arcana", "localized_name": "\u81f3\u5b9d"}, "slot": {"category": "slot", "internal_name": "weapon", "localized_name": "\u6b66\u5668"}, "type": {"category": "type", "internal_name": "wearable", "localized_name": "\u53ef\u4f69\u5e26"}}}, "item_id": 7247, "original_icon_url": "https://g.fp.ps.netease.com/market/file/59926f895e60273b4cf3f424sv02msLE", "steam_price": "29.48", "steam_price_cny": "200.19"}, "has_buff_price_history": true, "id": 14575, "market_hash_name": "Exalted Manifold Paradox", "market_min_price": "0", "name": "\u5c0a\u4eab \u65e0\u53cc\u8be1\u9b45", "quick_price": "131.28", "sell_min_price": "131.78", "sell_num": 284, "sell_reference_price": "131.78", "steam_market_url": "https://steamcommunity.com/market/listings/570/Exalted%20Manifold%20Paradox", "transacted_num": 0},

访问"steam_market_url":https://steamcommunity.com/market/listings/570/Exalted%20Manifold%20Paradox,正是页面第一个饰品。
所以我们要访问的url为https://buff.163.com/api/market/goods?game=dota2&page_num=+i+&min_price=35&max_price=200

    for i in range(5):# 标准url:https://buff.163.com/api/market/goods?game=dota2&page_num=1&min_price=35&max_price=200buff_dota2_url = 'https://buff.163.com/api/market/goods?game=dota2&page_num=' + str(i + 1) + '&min_price=35&max_price=200'buff_dota2_text = requests.get(url=buff_dota2_url, headers=headers, cookies=cookies).textprint(buff_dota2_text)

3.re正则匹配得到[饰品名字]和[BUFF价格]

再利用re正则匹配找到我们需要[饰品名字]和[BUFF价格]。
发现[饰品名字跟在"steam_market_url"后面,在https://buff.163.com/api/market/goods?game=dota2&page_num=1&min_price=35&max_price=200中查找"steam_market_url": "https://steamcommunity.com/market/listings/570/(.*)",发现仅有20个,意思就是每个item对应一个,那么这就是[饰品名字]的匹配规则,BUFF价格同理。
关于re.findall的使用参考悲恋花丶无心之人。

    for i in range(5):# 标准url:https://buff.163.com/api/market/goods?game=dota2&page_num=1&min_price=35&max_price=200buff_dota2_url = 'https://buff.163.com/api/market/goods?game=dota2&page_num=' + str(i + 1) + '&min_price=35&max_price=200'buff_dota2_text = requests.get(url=buff_dota2_url, headers=headers, cookies=cookies).text# 饰品名names_list_temp = re.findall(r'"steam_market_url": "https://steamcommunity.com/market/listings/570/(.*)",',buff_dota2_text, re.M)# BUFF售价price_list_temp = re.findall(r'"sell_min_price": "(.*)",', buff_dota2_text, re.M)

第二步:爬取steam的[steam价格]和[steam24小时售出数量]

1.访问steam返回html

[steam24小时售出数量]我只在库存中查看物品的时候看见过,所以进入库存,按F12打开开发者工具,选中网络,刷新页面后随便点一个物品。
在这里插入图片描述

红框的.json文件内容正是我们要的内容。
访问https://steamcommunity.com/market/priceoverview/?country=CN&currency=23&appid=570&market_hash_name=Exalted%20Manifold%20Paradox

{"success":true,"lowest_price":"? 201.02","volume":"64","median_price":"? 167.51"}
        steam_time = len(names_list_temp)# 取steam价格和在售数量for k in range(steam_time - 1):item = names_list_temp[k]steam_item_text = requests.get(url=url + item, headers=headers).textprint(steam_item_text)

2.re正则匹配得到[steam价格]和[steam24小时售出数量]

这里注意,re.findall得到的是列表,需要选择第一个才能进行比较与转换。

	steam_24h_qty = int(re.findall(r'"volume":"([0-9]*)",', steam_item_text, re.M)[0])price_steam_temp = re.findall(r'"lowest_price":"? ([0-9]*.[0-9]*)",', steam_item_text, re.M)[0]

第三步:对获得的数据进行处理

首先理一下逻辑,已知参数[饰品名字]和[BUFF价格],可通过[饰品名字]获得[steam价格]和[steam24小时售出数量],当[steam24小时售出数量]<一定值,这组数据就应该被删去,[steam价格]也不需要爬取,也就是:
1.通过[饰品名字]获得[steam24小时售出数量]
2.比较[steam24小时售出数量]判断删除该组还是爬取[steam价格]
3.进行删除
4.保存

1.通过[饰品名字]获得[steam24小时售出数量]

        steam_time = len(names_list_temp)# 取steam价格和在售数量for k in range(steam_time - 1):item = names_list_temp[k]steam_item_text = requests.get(url=url + item, headers=headers, cookies=steam_cookies).textsteam_24h_qty_temp = int(re.findall(r'"volume":"([0-9]*)",', steam_item_text, re.M)[0])

2.比较[steam24小时售出数量]判断删除该组还是爬取[steam价格]

        cleanlist = []steam_time = len(names_list_temp)# 取steam价格和在售数量for k in range(steam_time - 1):item = names_list_temp[k]steam_item_text = requests.get(url=url + item, headers=headers, cookies=steam_cookies).textprint(k + 1, "/", steam_time, ":", steam_item_text, item)try:steam_24h_qty_temp = int(re.findall(r'"volume":"([0-9]*)",', steam_item_text, re.M)[0])except IndexError:steam_24h_qty_temp = 0if steam_24h_qty_temp < 10:cleanlist.append(k)else:try:price_steam_temp0 = re.findall(r'"lowest_price":"? ([0-9]*.[0-9]*)",', steam_item_text, re.M)[0]price_steam_temp.append(price_steam_temp0)sell_num_list_temp.append(steam_24h_qty_temp)except IndexError:cleanlist.append(k)

3.进行删除

        for k in range(len(cleanlist) - 1, -1, -1):names_list_temp.pop(cleanlist[k])price_list_temp.pop(cleanlist[k])

4.保存

        for k in range(len(names_list_temp) - 1):soldprice_temp0 = float(price_steam_temp[k]) / 1.15percentage_temp0 = float(price_list_temp[k]) / soldprice_temp0soldprice_temp.append(soldprice_temp0)percentage_temp.append(percentage_temp0)# 饰品名name_list.extend(names_list_temp)# BUFF价格price_list.extend(price_list_temp)# steam价格price_steam_list.extend(price_steam_temp)# steam 24小时销售数量sell_num_list.extend(sell_num_list_temp)# 按steam市场最低价售出税后价格soldprice.extend(soldprice_temp)# 折值percentage.extend(percentage_temp)# 汇合信息写成表格并保存csv_name = ["name", "BUFF price", "steam price", "steam 24hour sold qty", "steam sellprice", "percentage"]csv_data = zip(name_list, price_list, price_steam_list, sell_num_list, soldprice, percentage)items_information = pd.DataFrame(columns=csv_name, data=csv_data)items_information.to_csv("items_information.csv")

总结

附代码

import requests
import re
import pandas as pd
import timedef main():time_start = time.time()# steam appid=750 为 DOTA2url = r'https://steamcommunity.com/market/priceoverview/?country=CN&currency=23&appid=570&market_hash_name='# steam cookiesteam_cookie_str = r''steam_cookies = {
    }for line in steam_cookie_str.split(';'):key, value = line.split('=', 1)steam_cookies[key] = value# 表头headers = {
    'User-Agent': ''}# BUFF cookiecookie_str = r''cookies = {
    }for line in cookie_str.split(';'):key, value = line.split('=', 1)cookies[key] = value# 初始化name_list = []price_list = []price_steam_list = []sell_num_list = []soldprice = []percentage = []for i in range(5):time_page_start = time.time()dec = time_page_start - time_startminute = int(dec / 60)second = dec % 60print("%02d:%02d page" % (minute, second), i)# 标准url:https://buff.163.com/api/market/goods?game=dota2&page_num=1&min_price=35&max_price=200buff_dota2_url = 'https://buff.163.com/api/market/goods?game=dota2&page_num=' + str(i + 1) + '&min_price=35&max_price=200'buff_dota2_text = requests.get(url=buff_dota2_url, headers=headers, cookies=cookies).text# 饰品名names_list_temp = re.findall(r'"steam_market_url": "https://steamcommunity.com/market/listings/570/(.*)",',buff_dota2_text, re.M)# BUFF售价price_list_temp = re.findall(r'"sell_min_price": "(.*)",', buff_dota2_text, re.M)cleanlist = []price_steam_temp = []soldprice_temp = []percentage_temp = []sell_num_list_temp = []print("BUFF当前页爬取完成,开始访问steam")steam_time = len(names_list_temp)# 取steam价格和在售数量for k in range(steam_time - 1):item = names_list_temp[k]steam_item_text = requests.get(url=url + item, headers=headers, cookies=steam_cookies).textprint(k + 1, "/", steam_time, ":", steam_item_text, item)time.sleep(5)try:steam_24h_qty_temp = int(re.findall(r'"volume":"([0-9]*)",', steam_item_text, re.M)[0])except IndexError:steam_24h_qty_temp = 0if steam_24h_qty_temp < 10:cleanlist.append(k)else:try:price_steam_temp0 = re.findall(r'"lowest_price":"? ([0-9]*.[0-9]*)",', steam_item_text, re.M)[0]price_steam_temp.append(price_steam_temp0)sell_num_list_temp.append(steam_24h_qty_temp)except IndexError:cleanlist.append(k)for k in range(len(cleanlist) - 1, -1, -1):names_list_temp.pop(cleanlist[k])price_list_temp.pop(cleanlist[k])for k in range(len(names_list_temp) - 1):soldprice_temp0 = float(price_steam_temp[k]) / 1.15percentage_temp0 = float(price_list_temp[k]) / soldprice_temp0soldprice_temp.append(soldprice_temp0)percentage_temp.append(percentage_temp0)# 饰品名name_list.extend(names_list_temp)# BUFF价格price_list.extend(price_list_temp)# steam价格price_steam_list.extend(price_steam_temp)# steam 24小时销售数量sell_num_list.extend(sell_num_list_temp)# 按steam市场最低价售出税后价格soldprice.extend(soldprice_temp)# 折值percentage.extend(percentage_temp)time_page_end = time.time()dec = time_page_end - time_page_startminute = int(dec / 60)second = dec % 60print("page_cost: %02dmin%02dsec" % (minute, second))# 汇合信息写成表格并保存csv_name = ["name", "BUFF price", "steam price", "steam 24hour sold qty", "steam sellprice", "percentage"]csv_data = zip(name_list, price_list, price_steam_list, sell_num_list, soldprice, percentage)items_information = pd.DataFrame(columns=csv_name, data=csv_data)items_information.to_csv("items_information.csv")if __name__ == "__main__":# 当程序被调用执行时,调用函数main()

不要忘记time.sleep()