如何用正则表达式提取网页内容
代码如下:<div id="title" class="blog_tit_cont">
<strong>
<span >
<span>[转]</span>
为了练好口语,你敢不敢每天读一遍,坚持一个月?
</span>
</strong>
<span id="pubTime" class="c_tx3">
<script type="text/javascript">
var pubtime = g_oBlogData.data.pubtime;
var pubDate = new Date(pubtime * 1000);
document.write(pubDate.getFullYear() + "." + (pubDate.getMonth() + 1) + "." + pubDate.getDate());
</script>
</span>
<span id="readNum" class="c_tx3"> </span>
<span id="quoteInfo" class="c_tx3"> </span>
</div>
如何提取div下的strong的内容?求详细源码
------解决方案--------------------------------------------------------
(?is)<strong>(?<strong>(.*))</strong>
------解决方案--------------------------------------------------------
try...
- C# code
Regex reg = new Regex(@"(?is)<div[^>]*>(?:(?!</?div).)*(<strong[^>]*>.*?</strong>)"); MatchCollection mc = reg.Matches(yourStr); foreach (Match m in mc) { richTextBox2.Text += m.Groups[1].Value + "\n"; }
------解决方案--------------------------------------------------------
- C# code
static void Main(string[] args) { string str = @"<div id=""title"" class=""blog_tit_cont""><strong><span ><span>[转]</span> 为了练好口语,你敢不敢每天读一遍,坚持一个月? </span></strong><span id=""pubTime"" class=""c_tx3""><script type=""text/javascript"">var pubtime = g_oBlogData.data.pubtime;var pubDate = new Date(pubtime * 1000);document.write(pubDate.getFullYear() + ""."" + (pubDate.getMonth() + 1) + ""."" + pubDate.getDate());</script></span><span id=""readNum"" class=""c_tx3""> </span><span id=""quoteInfo"" class=""c_tx3""> </span></div>"; Regex re = new Regex(@"(?is)(?<=<div id=""title""[^>]+>\s*<strong>).*?(?=</strong>)", RegexOptions.None); Console.WriteLine(re.Match(str).Value); //re.Match(str).Value就是你要的 Console.ReadLine(); }
------解决方案--------------------------------------------------------
C盘建一个1.txt
- C# code
<div id="title" class="blog_tit_cont"><strong><span ><span>[转]</span> 为了练好口语,你敢不敢每天读一遍,坚持一个月? </span></strong><span id="pubTime" class="c_tx3"><script type="text/javascript">var pubtime = g_oBlogData.data.pubtime;var pubDate = new Date(pubtime * 1000);document.write(pubDate.getFullYear() + "." + (pubDate.getMonth() + 1) + "." + pubDate.getDate());</script></span><span id="readNum" class="c_tx3"> </span><span id="quoteInfo" class="c_tx3"> </span></div>