nodriver 是一個 web automation, webscraping, bots, 下載點:
https://github.com/ultrafunkamsterdam/nodriver
在 nodriver 取得目前 url 解法有很多, 某一個解法如下:
async def nodriver_current_url(tab):
is_quit_bot = False
exit_bot_error_strings = [
"server rejected WebSocket connection: HTTP 500",
"[Errno 61] Connect call failed ('127.0.0.1',",
"[WinError 1225] ",
]
url = ""
if tab:
url_dict = {}
try:
url_dict = await tab.js_dumps('window.location.href')
except Exception as exc:
print(exc)
str_exc = ""
try:
str_exc = str(exc)
except Exception as exc2:
pass
if len(str_exc) > 0:
for each_error_string in exit_bot_error_strings:
if each_error_string in str_exc:
#print('quit bot by error:', each_error_string, driver)
is_quit_bot = True
url_array = []
if url_dict:
for k in url_dict:
if k.isnumeric():
if "0" in url_dict[k]:
url_array.append(url_dict[k]["0"])
url = ''.join(url_array)
return url, is_quit_bot
這個解法是透過 javascript 去取得 window.location.href 來解決.
另一個解法:
async def nodriver_current_url(driver, tab):
exit_bot_error_strings = [
"server rejected WebSocket connection: HTTP 500",
"[Errno 61] Connect call failed ('127.0.0.1',",
"[WinError 1225] ",
]
# return value
url = ""
is_quit_bot = False
last_active_tab = None
driver_info = await driver._get_targets()
if not tab.target in driver_info:
print("tab may closed by user before, or popup confirm dialog.")
tab = None
await driver
try:
for i, each_tab in enumerate(driver):
target_info = each_tab.target.to_json()
target_url = ""
if target_info:
if "url" in target_info:
target_url = target_info["url"]
if len(target_url) > 4:
if target_url[:4]=="http" or target_url == "about:blank":
print("found tab url:", target_url)
last_active_tab = each_tab
except Exception as exc:
print(exc)
if str(exc) == "list index out of range":
print("Browser closed, start to exit bot.")
is_quit_bot = True
tab = None
last_active_tab = None
if not last_active_tab is None:
tab = last_active_tab
if tab:
try:
target_info = tab.target.to_json()
if target_info:
if "url" in target_info:
url = target_info["url"]
#url = await tab.evaluate('window.location.href')
except Exception as exc:
print(exc)
str_exc = ""
try:
str_exc = str(exc)
except Exception as exc2:
pass
if len(str_exc) > 0:
if str_exc == "server rejected WebSocket connection: HTTP 404":
print("目前 nodriver 還沒準備好..., 請等到沒出現這行訊息再開始使用。")
for each_error_string in exit_bot_error_strings:
if each_error_string in str_exc:
#print('quit bot by error:', each_error_string, driver)
is_quit_bot = True
return url, is_quit_bot, last_active_tab
與前一個解法相比, 多回傳一個最後的作用中的tab, 不是從 ‘window.location.href’ 取得內容, 而是直接從 tab.target.target_info, 理論上, 效率應該會微微提升一點點, 因為使用的 cdp 指令不是較高階或複雜的 js_dumps, 而是只有 await driver.