| |

VerySource

 Forgot password?
 Register
Search
View: 633|Reply: 2

In a hurry: URL matching error! ! ! ! ! ! ! ! (Online etc.)

[Copy link]

1

Threads

1

Posts

2.00

Credits

Newbie

Rank: 1

Credits
2.00

 China

Post time: 2020-9-25 12:30:01
| Show all posts |Read mode
There is an html document like this:
<a href=""> </a>
<a href="http://classad.163.com/html/area/110/index.html">Shaan</a>
<a href="http://classad.163.com/html/area/341/index.html">Ning</a>
<a href="http://classad.163.com/html/area/577/index.html">Hidden</a>
<a href="http://classad.163.com/html/area/325/index.html">Yichang</a>
<a href="http://classad.163.com/html/area/31/index.html">Zhengzhou</a>
<a href=""> </a>
<a href="http://classad.163.com/html/area/254/index.html">Nantong</a>
<a href=""> </a>
<a href="http://classad.163.com/html/area/313/index.html">Jingmen</a>
<a href=""> </a>
<a href="http://classad.163.com/html/area/81/index.html">Rizhao</a>
<a href="http://classad.163.com/html/area/36/index.html">Luoyang</a>
<a href="http://classad.163.com/html/area/393/index.html"> Chaoyang District</a>
<a href=""> </a>
Use regular expressions:
<a\s+href\s*=\s*["|']?(?<uri>[^'"> ]*)["|']?[^<>]*>\s*(<[^<>]+ >)*(?<title>[^<>]*)(<[^<>]+>)*\s*</a>
It can match exactly, that is to say it can match 15 results (I have run it on the test tool); but the code I wrote can only match 11, my code is:
string htmlRegexEpression=@"<a\s+href\s*=\s*[""|']?(?<uri>[^'""> ]*)[""|']?[^<>]*>\s* (<[^<>]+>)*(?<title>[^<>]*)(<[^<>]+>)*\s*</a>";
Regex linksExpression =new Regex( htmlRegexEpression,
RegexOptions.Multiline | RegexOptions.IgnoreCase );
MatchCollection Matchs = linksExpression.Matches(PageHtmlContent);

Is the htmlRegexEpression converted incorrectly? ? Please master first aid! ! ! ! !
Infinite thanks! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
Reply

Use magic Report

0

Threads

11

Posts

11.00

Credits

Newbie

Rank: 1

Credits
11.00

 China

Post time: 2020-9-26 10:15:01
| Show all posts
I tried it, and 15 of them can be matched. This is the test code I used
MatchCollection mc=Regex.Matches(yourStr,@"<a\s+href\s*=\s*[""|']?(?<uri>[^'""> ]*)[""|']?[^< >]*>\s*(<[^<>]+>)*(?<title>[^<>]*)(<[^<>]+>)*\s*</a>");
foreach(Match m in mc)
{
    richTextBox1.Text +=m.Groups[0].Value+"\n";
}

I’m going to bed, I didn’t read it too carefully, but it shouldn’t be as troublesome as you wrote. Let’s read it tomorrow.
Reply

Use magic Report

0

Threads

11

Posts

11.00

Credits

Newbie

Rank: 1

Credits
11.00

 China

Post time: 2020-9-26 18:30:01
| Show all posts
After looking at it again, the code you gave should be no problem, you have to look at the processing code after you take out the Matchs, if you are right uri
And title for processing, these two items are all zero-length strings in <a href=""> </a>, but the result obtained in this way should be 10 instead of 11, or Give your processing code
Reply

Use magic Report

You have to log in before you can reply Login | Register

Points Rules

Contact us|Archive|Mobile|CopyRight © 2008-2023|verysource.com ( 京ICP备17048824号-1 )

Quick Reply To Top Return to the list