无法限制我的脚本来解析网页中的特定部分_python

我已经用python编写了一个脚本，以从网页中抓取Plot的描述。 事情是描述在几个p标签内。 还有其他p标签，我也不想刮。 一旦我的脚本完成了对Plot的描述的解析，它应该停止。 但是，我的下面脚本从Plot部分开始解析到最后的所有p标签。

如何限制我的脚本仅分析Plot的描述？

这是我写的：

import requests
from bs4 import BeautifulSoup

url = "https://en.wikipedia.org/wiki/Alien_(film)"

with requests.Session() as s:
    s.headers={"User-Agent":"Mozilla/5.0"}
    res = s.get(url)
    soup = BeautifulSoup(res.text,"lxml")
    plot = [item.text for item in soup.select_one("#Plot").find_parent().find_next_siblings("p")]
    print(plot)

如果您不是必须使用beautifulSoup，则可以尝试以下操作以获取所需的文本内容

from lxml import html

with requests.Session() as s:
    s.headers={"User-Agent":"Mozilla/5.0"}
    res = s.get(url)
    source = html.fromstring(res.content)
    plot = [item.text_content() for item in source.xpath('//p[preceding::h2[1][span="Plot"]]')]
    print(plot)

您可以在下一个标题之前选择段落，例如

with requests.Session() as s:
    s.headers={"User-Agent":"Mozilla/5.0"}
    res = s.get(url)
    soup = BeautifulSoup(res.text,"lxml")

    plot_start = [item for item in soup.select_one("#Plot").find_parent().find_next_siblings()]
    plot = []
    for item in plot_start:
        if item.name != 'h2':
            plot.append(item.text)
        else:
            break
    print(plot)

无法限制我的脚本来解析网页中的特定部分

问题描述

1楼

2楼