python 3 筆記 – 利用urllib來存取網頁

在python 2.x的版本,是用urllib2,而在python 3.x的版本,改用urllib,兩者的功能其實很相似。

直接在 python3 下執行下面的程式碼會出錯:

import urllib2.request

response = urllib2.urlopen("http://www.google.com")
html = response.read()
print(html)

錯誤訊息:

Import error: No module name urllib2

解法:

from urllib.request import urlopen
html = urlopen("http://www.google.com/")
print(html)

python3 基本款式:

import urllib.request; #用來建立請求

存取網頁Ex 1:

x = urllib.request.urlopen('https://www.google.com');
print(x.read());

存取網頁Ex 2:

url = 'https://www.google.com';
values = {'s':'basic',
          'submit':'search'}; #參數及參數值
data = urllib.parse.urlencode(values); #解析並轉為url編碼格式
data = data.encode('utf-8'); #將所有網址用utf8解碼
req = urllib.request.Request(url, data); #建立請求
resp = urllib.request.urlopen(req); #開啟網頁
respData = resp.read();
print(respData);

存取網頁Ex 3:

try:   
    url = 'https://www.google.com.tw/search?q=python' ;
    x = urllib.request.urlopen(url);   
    print('example-3', x.read());
except Exception as e:
    print('example-3', str(e));

#=>出現HTTP Error 403: Forbidden錯誤


存取網頁Ex 4: 修改 headers

try:   
    url = 'https://www.google.com.tw/search?q=python';
    headers = {};
    headers['User-Agent'] = 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17';
    req = urllib.request.Request(url, headers=headers);
    resp = urllib.request.urlopen(req);
    respData = str(resp.read().decode('utf-8')); #將所得的資料解碼
    saveFile = open('withHeaders.txt','w', encoding='utf8');
    saveFile.write(str(respData));
    saveFile.close();
except Exception as e:
    print(str(e));

存取網頁Ex 5: Encode to JSON and set the right headers:

import json

newConditions = {"con1":40, "con2":20, "con3":99, "con4":40, "password":"1234"} 
params = json.dumps(newConditions).encode('utf8')
req = urllib.request.Request(conditionsSetURL, data=params,
                             headers={'content-type': 'application/json'})
response = urllib.request.urlopen(req)

解決 urllib 遇到自簽 SSL 的問題:

<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:833)>

解法1

import ssl
import urllib.request
context = ssl._create_unverified_context()
print urllib.request.urlopen("https://www.google.com/", context=context).read()

解法2

import ssl
import urllib.request
ssl._create_default_https_context = ssl._create_unverified_context
print urllib.request.urlopen("https://www.google.com/").read()

發佈留言

發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *