当前位置: 代码迷 >> Web前端 >> 关于java模拟ie 访问web网站的解决办法
  详细解决方案

关于java模拟ie 访问web网站的解决办法

热度:165   发布时间:2012-10-14 14:55:07.0
关于java模拟ie 访问web网站的解决方法
在用Java的HttpURLConnection 来下载网页,发现访问google的网站时,会被google拒绝掉。

       try
        {
            url = new URL(urlStr);
            httpConn = (HttpURLConnection) url.openConnection();
            HttpURLConnection.setFollowRedirects(true);

            // logger.info(httpConn.getResponseMessage());
            in = httpConn.getInputStream();
            out = new FileOutputStream(new File(outPath));

            chByte = in.read();
            while (chByte != -1)
            {
                out.write(chByte);
                chByte = in.read();
            }
        }
        catch (MalformedURLException e)
        {
         }
        }



经过一段时间的研究和查找资料,发现是由于上面的代码缺少了一些必要的信息导致,增加更加详细的属性

            httpConn.setRequestMethod("GET");
            httpConn.setRequestProperty("User-Agent","Mozilla/4.0 (compatible; MSIE 6.0; Windows 2000)");

完整代码如下:
   public static void DownLoadPages(String urlStr, String outPath)
    {
        int chByte = 0;
        URL url = null;
        HttpURLConnection httpConn = null;
        InputStream in = null;
        FileOutputStream out = null;

        try
        {
            url = new URL(urlStr);
            httpConn = (HttpURLConnection) url.openConnection();
            HttpURLConnection.setFollowRedirects(true);
            httpConn.setRequestMethod("GET");
            httpConn.setRequestProperty("User-Agent","Mozilla/4.0 (compatible; MSIE 6.0; Windows 2000)");
           
            // logger.info(httpConn.getResponseMessage());
            in = httpConn.getInputStream();
            out = new FileOutputStream(new File(outPath));

            chByte = in.read();
            while (chByte != -1)
            {
                out.write(chByte);
                chByte = in.read();
            }
        }
        catch (MalformedURLException e)
        {
            e.printStackTrace();
        }
        catch (IOException e)
        {
            e.printStackTrace();
        }
        finally
        {
            try
            {
                out.close();
                in.close();
                httpConn.disconnect();
            }
            catch (Exception ex)
            {
                ex.printStackTrace();
            }
        }
    }

此外,还有第二种方法可以访问Google的网站,就是用apache的一个工具HttpClient 模仿一个浏览器来访问Google

        Document document = null;
        HttpClient httpClient = new HttpClient();
       
        GetMethod getMethod = new GetMethod(url);
        getMethod.setFollowRedirects(true);
        int statusCode = httpClient.executeMethod(getMethod);
       
        if (statusCode == HttpStatus.SC_OK)
        {
            InputStream in = getMethod.getResponseBodyAsStream();
            InputSource is = new InputSource(in);

            DOMParser domParser = new DOMParser();   //nekoHtml 将取得的网页转换成dom
            domParser.parse(is);
            document = domParser.getDocument();
           
            System.out.println(getMethod.getURI());
           
        }
        return document;

推荐使用第一种方式,使用HttpConnection 比较轻量级,速度也比第二种HttpClient 的快。
  相关解决方案