Screenshot Enricher

Screenshot Enricher#

Module type

enricher

Captures screenshots and optionally saves web pages as PDFs using a WebDriver.

Features#

  • Takes screenshots of web pages, with configurable width, height, and timeout settings.

  • Optionally saves pages as PDFs, with additional configuration for PDF printing options.

  • Bypasses URLs detected as authentication walls.

  • Integrates seamlessly with the metadata enrichment pipeline, adding screenshots and PDFs as media.

Notes#

  • Requires a WebDriver (e.g., ChromeDriver) installed and accessible via the system’s PATH.

Configuration Options#

YAML#

screenshot_enricher:
  width: 1280
  height: 720
  timeout: 60
  sleep_before_screenshot: 4
  http_proxy: ''
  save_to_pdf: false
  print_options: {}

Command Line:#

Option

Description

Default

Type

screenshot_enricher.width

Optional. width of the screenshots

1280

string

screenshot_enricher.height

Optional. height of the screenshots

720

string

screenshot_enricher.timeout

Optional. timeout for taking the screenshot

60

string

screenshot_enricher.sleep_before_screenshot

Optional. seconds to wait for the pages to load before taking screenshot

4

string

screenshot_enricher.http_proxy

Optional. http proxy to use for the webdriver, eg http://proxy-user:password@proxy-ip:port

string

screenshot_enricher.save_to_pdf

Optional. save the page as pdf along with the screenshot. PDF saving options can be adjusted with the ‘print_options’ parameter

False

string

screenshot_enricher.print_options

Optional. options to pass to the pdf printer

{}

string

API Reference