有如下的结构(从网上抓取的网页源文件):
<h1>2014</h1>
<table>
<tr>
<td>
<strong>November</strong>
<a href="a.html">a</a>
</td>
<td></td>
</tr>
</table>
<h1>2013</h1>
<table>
<tr>
<td>
<strong>October</strong>
<a href="b.html">b</a>
</td>
<td>
<strong>September</strong>
<a href="c.html">c</a>
</td>
</tr>
</table>
<h1>2012</h1>
<table>
<tr>
<td>
<strong>August</strong>
<a href="d.html">d</a>
</td>
<td>
<strong>July</strong>
<a href="e.html">e</a>
</td>
<td>
<strong>June</strong>
<a href="f.html">f</a>
</td>
</tr>
</table>
现在要通过正则表达式输出如下结果(即table中可能有数量不等的td,要能解析这些数量不固定的td):
2014 November a.html
2013 October b.html
2013 September c.html
2012 August d.html
2012 July e.html
2012 June f.html
请各位大侠指教,不胜感激~~~
------解决思路----------------------
public static void main(String[] args) {
String s = "<h1>2014</h1><table><tr><td><strong>November</strong><a href=\"a.html\">a</a></td><td></td></tr></table><h1>2013</h1><table><tr><td><strong>October</strong><a href=\"b.html\">b</a></td><td><strong>September</strong><a href=\"c.html\">c</a></td></tr></table><h1>2012</h1><table><tr><td><strong>August</strong><a href=\"d.html\">d</a></td><td><strong>July</strong><a href=\"e.html\">e</a></td><td><strong>June</strong><a href=\"f.html\">f</a></td></tr></table>";
Matcher m = Pattern.compile("<h1>(.*?)</h1><table>(.*?)</table>").matcher(s);
while (m.find()) {
Matcher subM = Pattern.compile("<strong>(.*?)</strong>.*?<a href=\"(.*?)\">.</a>").matcher(m.group(2));
while (subM.find()) {
System.out.println(m.group(1) + " " + subM.group(1) + " "+ subM.group(2));
}
}
}------解决思路----------------------
html格式的建议用jsoup要比正则精准,java的正则不支持平衡组的,不适合解析html这样的数据格式。你贴出来的只是部分,如果其它地方不会影响的话,可以用楼上的正则