在python 2.x的版本,是用urllib2,而在python 3.x的版本,改用urllib,兩者的功能其實很相似。
直接在 python3 下執行下面的程式碼會出錯:
import urllib2.request response = urllib2.urlopen("http://www.google.com") html = response.read() print(html)
錯誤訊息:
Import error: No module name urllib2
解法:
from urllib.request import urlopen
html = urlopen("http://www.google.com/")
print(html)
python3 基本款式:
import urllib.request; #用來建立請求
存取網頁Ex 1:
x = urllib.request.urlopen('https://www.google.com'); print(x.read());
存取網頁Ex 2:
url = 'https://www.google.com'; values = {'s':'basic', 'submit':'search'}; #參數及參數值 data = urllib.parse.urlencode(values); #解析並轉為url編碼格式 data = data.encode('utf-8'); #將所有網址用utf8解碼 req = urllib.request.Request(url, data); #建立請求 resp = urllib.request.urlopen(req); #開啟網頁 respData = resp.read(); print(respData);
存取網頁Ex 3:
try: url = 'https://www.google.com.tw/search?q=python' ; x = urllib.request.urlopen(url); print('example-3', x.read()); except Exception as e: print('example-3', str(e));
#=>出現HTTP Error 403: Forbidden錯誤
存取網頁Ex 4: 修改 headers
try: url = 'https://www.google.com.tw/search?q=python'; headers = {}; headers['User-Agent'] = 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17'; req = urllib.request.Request(url, headers=headers); resp = urllib.request.urlopen(req); respData = str(resp.read().decode('utf-8')); #將所得的資料解碼 saveFile = open('withHeaders.txt','w', encoding='utf8'); saveFile.write(str(respData)); saveFile.close(); except Exception as e: print(str(e));
存取網頁Ex 5: Encode to JSON and set the right headers:
import json
newConditions = {"con1":40, "con2":20, "con3":99, "con4":40, "password":"1234"}
params = json.dumps(newConditions).encode('utf8')
req = urllib.request.Request(conditionsSetURL, data=params,
headers={'content-type': 'application/json'})
response = urllib.request.urlopen(req)
解決 urllib 遇到自簽 SSL 的問題:
<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:833)>
解法1
import ssl import urllib.request context = ssl._create_unverified_context() print urllib.request.urlopen("https://www.google.com/", context=context).read()
解法2
import ssl import urllib.request ssl._create_default_https_context = ssl._create_unverified_context print urllib.request.urlopen("https://www.google.com/").read()