Take full page screenshot in python without page breaking

2020-10-132020-10-13

上一個星期二，使用上還很正常，操作的環境是 macOS 10.15 , python 3.8.5, selenium 3.141.0, chrome 85, chromedrive 85.

這周似乎是因為升級到 chrome 86 ＋chromedrive 86 的關係，使用 python + selenium 來取得網頁的 screenshot 的功能會出問題，在使用 set_window_size() 或是 options.add_argument(“–window-size=?,?) 都無法完成套用到指定的長度。

預期是可以長度到 16,000 px, 在手動設定 set_window_size 或 add_argument ，再使用driver.save_screenshot(file_path) 存出來的長度都只有到 8600px.

應該是 chrome 或是 chromedrive 改版所造成。反正都升級為最新版本，只好使用其他替代解法。

element = driver.find_element_by_tag_name(‘body’)
element_png = element.screenshot_as_png
with open(“test2.png”,”wb”) as file:
file.write(element_png)

api document：
http://selenium-python.readthedocs.io/api.html

使用 find_element_by_tag_name(‘body’) 的確可以讓height 超過 8000，長度是正確的 16,000px ，但是內容錯誤了，這真的很奇怪，不論是不是使用 –headless （無視窗模式) 內容是錯的，內容錯的地方是重覆了上半段的內容在下半段。

Max 目前使用中，會出問題是code 是下面這段：

def save_screenshot(driver, path):
    # Ref: https://stackoverflow.com/a/52572919/
    original_size = driver.get_window_size()
    required_width = driver.execute_script('return document.body.parentNode.scrollWidth')
    required_height = driver.execute_script('return document.body.parentNode.scrollHeight')
    driver.set_window_size(required_width, required_height)
    # driver.save_screenshot(path)  # has scrollbar
    driver.find_element_by_tag_name('body').screenshot(path)  # avoids scrollbar
    driver.set_window_size(original_size['width'], original_size['height'])

上面 code 有一段的註解，driver.save_screenshot() 聽說如果是 python 3.6 以前，可以使用 save_screenshot() 的寫法，可以連改用 python2 來跑，也一樣無法取得正確的長度。

網友建議的解法：

The first attempt failed because Selenium takes screenshots only of the view port. You can get the full page or specific element screenshot using request to the driver.command_executor. Using take_element_screenshot on <body> will produce the same results as take_full_page_screenshot

def __take_screenshot(self, clip=None):

    def send(cmd, params):
        url = f'{self.__driver.command_executor._url}/session/{self.__driver.session_id}/chromium/send_command_and_get_result'
        body = json.dumps({'cmd': cmd, 'params': params})
        return self.__driver.command_executor._request('POST', url, body).get('value')

    script = '({width: Math.max(window.innerWidth, document.body.scrollWidth, document.documentElement.scrollWidth)|0,' \
             'height: Math.max(innerHeight, document.body.scrollHeight, document.documentElement.scrollHeight)|0,' \
             'deviceScaleFactor: window.devicePixelRatio || 1, mobile: typeof window.orientation !== "undefined"})'

    data = {'format': 'png', 'fromSurface': True}
    if clip:
        data['clip'] = clip

    response = send('Runtime.evaluate', {'returnByValue': True, 'expression': script})
    send('Emulation.setDeviceMetricsOverride', response['result']['value'])
    screenshot = send('Page.captureScreenshot', data)
    send('Emulation.clearDeviceMetricsOverride', {})

    to_save = base64.b64decode(screenshot['data']) # same object as returned by driver.get_screenshot_as_png()
    im = Image.open(BytesIO(to_save))
    im.save('screenshot.png')

def take_element_screenshot(self, element):

    clip = self.__driver.execute_script('rect = arguments[0].getBoundingClientRect();'
                                        'docRect = arguments[0].ownerDocument.documentElement.getBoundingClientRect();'
                                        'return {x: rect.left - docRect.left, y: rect.top - docRect.top, width: rect.width, height: rect.height, scale: 1};', element)

    self.__take_screenshot(clip)

Max的程式語言筆記

Take full page screenshot in python without page breaking

發佈留言取消回覆

Related Posts

發佈留言 取消回覆

發佈留言取消回覆