关于java模拟ie 访问web网站的解决办法 _Web前端

关于java模拟ie 访问web网站的解决方法
在用Java的HttpURLConnection 来下载网页，发现访问google的网站时，会被google拒绝掉。

       try
        {
            url = new URL(urlStr);
            httpConn = (HttpURLConnection) url.openConnection();
            HttpURLConnection.setFollowRedirects(true);

            // logger.info(httpConn.getResponseMessage());
            in = httpConn.getInputStream();
            out = new FileOutputStream(new File(outPath));

            chByte = in.read();
            while (chByte != -1)
            {
                out.write(chByte);
                chByte = in.read();
            }
        }
        catch (MalformedURLException e)
        {
         }
        }

经过一段时间的研究和查找资料，发现是由于上面的代码缺少了一些必要的信息导致，增加更加详细的属性

            httpConn.setRequestMethod("GET");
            httpConn.setRequestProperty("User-Agent","Mozilla/4.0 (compatible; MSIE 6.0; Windows 2000)");

完整代码如下：
   public static void DownLoadPages(String urlStr, String outPath)
    {
        int chByte = 0;
        URL url = null;
        HttpURLConnection httpConn = null;
        InputStream in = null;
        FileOutputStream out = null;

        try
        {
            url = new URL(urlStr);
            httpConn = (HttpURLConnection) url.openConnection();
            HttpURLConnection.setFollowRedirects(true);
            httpConn.setRequestMethod("GET");
            httpConn.setRequestProperty("User-Agent","Mozilla/4.0 (compatible; MSIE 6.0; Windows 2000)");

            // logger.info(httpConn.getResponseMessage());
            in = httpConn.getInputStream();
            out = new FileOutputStream(new File(outPath));

            chByte = in.read();
            while (chByte != -1)
            {
                out.write(chByte);
                chByte = in.read();
            }
        }
        catch (MalformedURLException e)
        {
            e.printStackTrace();
        }
        catch (IOException e)
        {
            e.printStackTrace();
        }
        finally
        {
            try
            {
                out.close();
                in.close();
                httpConn.disconnect();
            }
            catch (Exception ex)
            {
                ex.printStackTrace();
            }
        }
    }

此外，还有第二种方法可以访问Google的网站，就是用apache的一个工具HttpClient 模仿一个浏览器来访问Google

        Document document = null;
        HttpClient httpClient = new HttpClient();

        GetMethod getMethod = new GetMethod(url);
        getMethod.setFollowRedirects(true);
        int statusCode = httpClient.executeMethod(getMethod);

        if (statusCode == HttpStatus.SC_OK)
        {
            InputStream in = getMethod.getResponseBodyAsStream();
            InputSource is = new InputSource(in);

            DOMParser domParser = new DOMParser();   //nekoHtml 将取得的网页转换成dom
            domParser.parse(is);
            document = domParser.getDocument();

            System.out.println(getMethod.getURI());

        }
        return document;

推荐使用第一种方式，使用HttpConnection 比较轻量级，速度也比第二种HttpClient 的快。