问题描述
我试图解析一些LinkedIn数据,我想在一个for循环中获取此范围内的文本。 因此,下面将返回一个字符串=“ 2个共享连接”
<span class="search-result__social-proof-count Sans-13px-black-55%-semibold text-align-left ml1">
2 shared connections
</span>
这是xpath:
//*[@id="ember4490"]/span
到目前为止,我可以使用以下代码正确选择跨度:
mutual_conns_with_text = div.find('span', {'class': 'search-result__social-proof-count Sans-13px-black-55%-semibold text-align-left ml1'})
但是,上面选择的是整个范围,而不仅仅是文本 。 下面的代码引发异常:
mutual_conns_with_text = div.find('span', {'class': 'search-result__social-proof-count Sans-13px-black-55%-semibold text-align-left ml1'}).getText()
例外:
AttributeError: 'NoneType' object has no attribute 'getText'
1楼
您可以简单地要求span
元素的text
属性:
>>> import bs4
>>> HTML = '''\
... <span class="search-result__social-proof-count Sans-13px-black-55%-semibold text-align-left ml1">
... 2 shared connection
... </span>'''
>>> soup = bs4.BeautifulSoup(HTML, 'lxml')
>>> mutual_conns_with_text = soup.find('span', {'class': 'search-result__social-proof-count Sans-13px-black-55%-semibold text-align-left ml1'})
>>> mutual_conns_with_text.text
'\n\t2 shared connection\n'