当前位置: 代码迷 >> ASP.NET >> 抓取网页指定数据,该怎么解决
  详细解决方案

抓取网页指定数据,该怎么解决

热度:7911   发布时间:2013-02-25 00:00:00.0
抓取网页指定数据
HTML code
 <tr id="tr_domains_16969543" style="cursor:auto;" onclick="selRow(this);" onmouseover="tr_Mouseover(this)" onmouseout="tr_Mouseout(this)">      <td class="domainname" >               <div class="domainurl">                 <a href="http://whois.chinaz.com/30n.net" id="domain_1" target="_blank" title="查看">a</a>               </div>      </td>                                                <td>b</td>                                                <td>c</td>                                                <td>d</td>                                                <td>d</td></tr> <tr id="tr_domains_16969543" style="cursor:auto;" onclick="selRow(this);" onmouseover="tr_Mouseover(this)" onmouseout="tr_Mouseout(this)">      <td class="domainname" >               <div class="domainurl">                 <a href="http://whois.chinaz.com/30n.net" id="domain_1" target="_blank" title="查看">a</a>               </div>      </td>                                                <td>b</td>                                                <td>c</td>                                                <td>d</td>                                                <td>d</td></tr> <tr id="tr_domains_16969543" style="cursor:auto;" onclick="selRow(this);" onmouseover="tr_Mouseover(this)" onmouseout="tr_Mouseout(this)">      <td class="domainname" >               <div class="domainurl">                 <a href="http://whois.chinaz.com/30n.net" id="domain_1" target="_blank" title="查看">a</a>               </div>      </td>                                                <td>b</td>                                                <td>c</td>                                                <td>d</td>                                                <td>e</td></tr> <tr id="tr_domains_16969543" style="cursor:auto;" onclick="selRow(this);" onmouseover="tr_Mouseover(this)" onmouseout="tr_Mouseout(this)">      <td class="domainname" >               <div class="domainurl">                 <a href="http://whois.chinaz.com/30n.net" id="domain_1" target="_blank" title="查看">a</a>               </div>      </td>                                                <td>b</td>                                                <td>c</td>                                                <td>d</td>                                                <td>e</td></tr>


如何把td内的每组数据分别提取出来?

------解决方案--------------------------------------------------------
C# code
string url = "http://del.chinaz.com/";            WebRequest request = WebRequest.Create(url); //请求url            WebResponse response = request.GetResponse(); //获取url数据            StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.GetEncoding("utf-8"));            string tempStr = reader.ReadToEnd();            string pattern = @"(?i)<tr[^>]*?id=(['""]?)tr_domains[^'""]*?\1[^>]*?>[\s\S]*?<a[^>]*?id=(['""]?)domain[^'""]*?\2";            pattern += @"[^>]*?>(?<a>[^>]*?)</a>[\s\S]*?<td[^>]*?>(?<b>[\s\S]*?)</td>\s*?<td[^>]*?>(?<c>[\s\S]*?)</td>\s*?";            pattern += @"<td[^>]*?>(?<d>[\s\S]*?)</td>\s*?<td[^>]*?>(?<e>[\s\S]*?)</td>\s*?";            //循环读取            foreach (Match m in Regex.Matches(tempStr, pattern))            {                string a = m.Groups["a"].Value;//1dq.net                string b = m.Groups["b"].Value;//3                string c = m.Groups["c"].Value;//net                string d = m.Groups["d"].Value;//2012-08-26                string e1 = m.Groups["e"].Value;//"Delete            }
  相关解决方案