问题描述
我试图用硒刮一张产品表。
这是我的示例表:
<div class="article">
<table style="width: 100%">
<tbody><tr>
<td class="trenner_u"></td>
<td class="trenner_u">
<a href="/details/12900101" class="changeable">
<span>Product 1 </span>
</a>
</td>
<td class="trenner_lu">
11.11.1999
</td>
<td class="trenner_lu">
<a title="Category Product Group" href="/grp/detailsSmallTB_iframe=true&height=132&width=420" class="thickbox">Group 1</a>
</td>
<td class="trenner_lu">
1999$
</td>
</tr>
<tr>
<td class="trenner_u"></td>
<td class="trenner_u">
<a href="/details/12900347" class="changeable">
<span>Product 2 </span>
</a>
</td>
<td class="trenner_lu">
1.12.1944
</td>
<td class="trenner_lu">
<a title="Category Product Group" href="/grp/detailsSmallTB_iframe=true&height=132&width=420" class="thickbox">Group 2</a>
</td>
<td class="trenner_lu">
1234$
</td>
</tr>
<tr>
<td class="trenner_u"></td>
<td class="trenner_u">
<a href="/details/12908635" class="changeable">
<img class="positionable" src="/ImageImage/12908635" alt="" style="width: 100px; opacity: 0.9;">
<span>Product 1 </span>
<img src="/Content/images/icons/photo.png" alt="Foto">
</a>
</td>
<td class="trenner_lu">
05.12.1950
</td>
<td class="trenner_lu">
<a title="Category Product Group" href="/grp/detailsSmallTB_iframe=true&height=132&width=420" class="thickbox">Group 2</a>
,<a title="Category Product Group" href="/grp/detailsSmallTB_iframe=true&height=132&width=420" class="thickbox">Group 4</a>
</td>
<td class="trenner_lu">
131282$
</td>
</tr>
</tbody></table>
</div>
我试图用以下方法刮每个元素:
List<WebElement> links = driver.findElements(By.xpath("//*[@id=\"home\"]/div[3]/table/tbody/tr/td[2]/a"));
List<WebElement> prodNames = driver.findElements(By.xpath("//*[@id=\"home\"]/div[3]/table/tbody/tr/td[2]/a"));
List<WebElement> group = driver.findElements(By.xpath("//*[@id=\"home\"]/div[3]/table/tbody/tr/td[4]/a"));
但是,正如您所看到的,我的td
元素之一内部有两个链接,因此我的 WebElement 列表的长度不一样,并且很难合并在一起。
我想要的列表输出应该是这样的:
[Product 1, 11.11.1999, Group 1, 1999$], [Product 2, 1.12.1944,Group 2, 1234$], [Product 1, 05.12.1950, Group 2 Group 2, 131282$]
任何建议如何更有效地刮取这样的桌子?
我感谢您的回复!
1楼
想想你与对象交互的一切:
class Table {
private static final String TABLE_CELL = "//table/tbody/tr[%d]/td[%d]";
public String getTableCellText(int row, int col) {
WebElement cell = driver.findElement(By.xpath(String.format(TABLE_CELL, row, col)));
return cell.getText();
}
}
您可以根据需要使用它:
Table t = new Table();
System.out.println(t.getTableCellText(3, 5)); // prints 131282$
2楼
您可能可以遍历每一行,以更清楚地了解您在 python 中所做的事情:
rows = driver.find_elements(By.XPATH, "//*[@id=\"home\"]/div[3]/table/tbody/tr")
for row in rows:
cells = row.find_elements(By.XPATH, "//td")
product_name = cells[1].text
... etc ...