java爬虫与python爬虫对比

java爬虫与python爬虫的对比:

python做爬虫语法更简单,代码更简洁。java的语法比python严格,而且代码也更复杂

示例如下:

url请求:

java版的代码如下:

public String call (String url){
          
   

            String content = "";

            BufferedReader in = null;

            try{
          
   

                URL realUrl = new URL(url);

                URLConnection connection = realUrl.openConnection();

                connection.connect();

                in = new BufferedReader(new InputStreamReader(connection.getInputStream(),"gbk"));

                String line ;

                while ((line = in.readLine()) != null){
          
   

                    content += line + "
";

                }

            }catch (Exception e){
          
   

                e.printStackTrace();

            }

            finally{
          
   

                try{
          
   

                    if (in != null){
          
   

                        in.close();

                    }

                }catch(Exception e2){
          
   

                    e2.printStackTrace();

                }

            }

            return content;

        }

python版的代码如下:

# coding=utf-8

import chardet

import urllib2

url = "http://www.baidu.com"

data = (urllib2.urlopen(url)).read()

charset = chardet.detect(data)

code = charset[encoding]

content = str(data).decode(code, ignore).encode(utf8)

print content

正则表达式

java版的代码如下:

public String call(String content) throws Exception {
          
   

            Pattern p = Pattern.compile("content":".*?"");

            Matcher match = p.matcher(content);

            StringBuilder sb = new StringBuilder();

            String tmp;

            while (match.find()){
          
   

                tmp = match.group();

                tmp = tmp.replaceAll(""", "");

                tmp = tmp.replace("content:", "");

                tmp = tmp.replaceAll("<.*>", "");

                sb.append(tmp + "
");

            }

            String comment = sb.toString();

            return comment;

        }

    }

python的代码如下:

import repattern = re.compile(正则)

group = pattern.findall(字符串)
经验分享 程序员 微信小程序 职场和发展