| |

VerySource

 Forgot password?
 Register
Search
View: 641|Reply: 1

Regular expression

[Copy link]

2

Threads

3

Posts

4.00

Credits

Newbie

Rank: 1

Credits
4.00

 China

Post time: 2020-1-30 22:40:01
| Show all posts |Read mode
// Create a request for the URL.
        string strUrl;
        strUrl = "http://www.google.cn";
        WebRequest request = HttpWebRequest.Create (strUrl);
        // If required by the server, set the credentials.
        request.Credentials = CredentialCache.DefaultCredentials;

        // Get the response.
        HttpWebResponse response = (HttpWebResponse) request.GetResponse ();
        // Display the status.
        //Response.Write(response.StatusCode);
        // Get the stream containing content returned by the server.
        Stream dataStream = response.GetResponseStream ();
        // Open the stream using a StreamReader for easy access.

        StreamReader reader = new StreamReader (dataStream, System.Text.Encoding.Default);
        // Read the content.
        string responseFromServer = reader.ReadToEnd ();
        // Display the content.
        string pattern = @ "<(((a | link). * href) | ((img | script). * src) | (form. * action))\s * =\s * [" "']? (? < link> [^ '""\s] *) ";
        Regex reg = new Regex (pattern, RegexOptions.IgnoreCase);
        for (Match m = reg.Match (responseFromServer); m.Success; m = m.NextMatch ())
        {

            Response.Write (m.Groups ["link"]. Value.ToString () + "<br>");

        }

        Response.Write (responseFromServer);
        // Cleanup the streams and the response.

        reader.Close ();
        dataStream.Close ();
        response.Close ();

======================================================= =====================
The regular expression can resolve the quoted address in the webpage, but why ca n’t it be quoted?
Please help me take a look, thank you in advance.
Reply

Use magic Report

0

Threads

11

Posts

11.00

Credits

Newbie

Rank: 1

Credits
11.00

 China

Post time: 2020-3-14 11:45:01
| Show all posts
Your regular expression
string pattern = @ "<(((a | link). * href) | ((img | script). * src) | (form. * action))\s * =\s * [" "'] ?? link> [^ '""\s] *) ";
It should be able to parse out the unquoted address, but there will be some extra text such as ">" that is not needed
Because (? <Link> [^ '""\s] *) is not enough for the boundary condition of the address without quotes, you can use the following try
string pattern = @ "<(((a | link). * href) | ((img | script). * src) | (form. * action))\s * =\s * [" "']? (? < link> [^ '""\s>] *) ";
Here, you need to add the unquoted address after the last []. You can determine ordinary characters such as ">" and add them according to the specific situation
Reply

Use magic Report

You have to log in before you can reply Login | Register

Points Rules

Contact us|Archive|Mobile|CopyRight © 2008-2023|verysource.com ( 京ICP备17048824号-1 )

Quick Reply To Top Return to the list