当前位置: 代码迷 >> 综合 >> Python练习册,每天一个小程序(十)
  详细解决方案

Python练习册,每天一个小程序(十)

热度:61   发布时间:2023-09-05 19:35:18.0

第 0009 题: 一个HTML文件,找出里面的链接

解答,这个问题的解答同上一题解答方式,使用soup获取所有的href链接

#encoding:utf-8import requests
from bs4 import BeautifulSoup
import urllib.requestdef get_page(url):response = requests.get(url)soup = BeautifulSoup(response.text, "lxml")return soupdef get_page_urllib(url):resp = urllib.request.urlopen(url)soup = BeautifulSoup(resp.read(), "html.parser")return soupdef get_page_links(soup):links = []page_links = soup.find_all("link")for link in page_links:links.append(link.get("href"))return linkssoup = get_page("https://github.com/Yixiaohan/show-me-the-code")
links = get_page_links(soup)
print(links)