当前位置: 代码迷 >> ASP.NET >> 怎么用正则表达式提取网页内容
  详细解决方案

怎么用正则表达式提取网页内容

热度:3668   发布时间:2013-02-25 00:00:00.0
如何用正则表达式提取网页内容
如何用正则表达式提取网页内容
代码如下:<div id="title" class="blog_tit_cont">
<strong>

<span >


<span>[转]</span>
为了练好口语,你敢不敢每天读一遍,坚持一个月? 
</span>


</strong>
<span id="pubTime" class="c_tx3">
<script type="text/javascript">
var pubtime = g_oBlogData.data.pubtime;
var pubDate = new Date(pubtime * 1000);
document.write(pubDate.getFullYear() + "." + (pubDate.getMonth() + 1) + "." + pubDate.getDate());
</script>
</span>
<span id="readNum" class="c_tx3"> </span>
<span id="quoteInfo" class="c_tx3"> </span>
</div>




如何提取div下的strong的内容?求详细源码

------解决方案--------------------------------------------------------
(?is)<strong>(?<strong>(.*))</strong>
------解决方案--------------------------------------------------------
try...

C# code
            Regex reg = new Regex(@"(?is)<div[^>]*>(?:(?!</?div).)*(<strong[^>]*>.*?</strong>)");            MatchCollection mc = reg.Matches(yourStr);            foreach (Match m in mc)            {                richTextBox2.Text += m.Groups[1].Value + "\n";            }
------解决方案--------------------------------------------------------

C# code
 static void Main(string[] args)        {            string str = @"<div id=""title"" class=""blog_tit_cont""><strong><span ><span>[转]</span>    为了练好口语,你敢不敢每天读一遍,坚持一个月?  </span></strong><span id=""pubTime"" class=""c_tx3""><script type=""text/javascript"">var pubtime = g_oBlogData.data.pubtime;var pubDate = new Date(pubtime * 1000);document.write(pubDate.getFullYear() + ""."" + (pubDate.getMonth() + 1) + ""."" + pubDate.getDate());</script></span><span id=""readNum"" class=""c_tx3""> </span><span id=""quoteInfo"" class=""c_tx3""> </span></div>";            Regex re = new Regex(@"(?is)(?<=<div id=""title""[^>]+>\s*<strong>).*?(?=</strong>)", RegexOptions.None);                   Console.WriteLine(re.Match(str).Value);  //re.Match(str).Value就是你要的            Console.ReadLine();        }
------解决方案--------------------------------------------------------
C盘建一个1.txt
C# code
<div id="title" class="blog_tit_cont"><strong><span ><span>[转]</span>    为了练好口语,你敢不敢每天读一遍,坚持一个月?  </span></strong><span id="pubTime" class="c_tx3"><script type="text/javascript">var pubtime = g_oBlogData.data.pubtime;var pubDate = new Date(pubtime * 1000);document.write(pubDate.getFullYear() + "." + (pubDate.getMonth() + 1) + "." + pubDate.getDate());</script></span><span id="readNum" class="c_tx3"> </span><span id="quoteInfo" class="c_tx3"> </span></div>