Screenshot Enricher#
Module type
Captures screenshots and optionally saves web pages as PDFs using a WebDriver.
Features#
Takes screenshots of web pages, with configurable width, height, and timeout settings.
Optionally saves pages as PDFs, with additional configuration for PDF printing options.
Bypasses URLs detected as authentication walls.
Integrates seamlessly with the metadata enrichment pipeline, adding screenshots and PDFs as media.
Notes#
Requires a WebDriver (e.g., ChromeDriver) installed and accessible via the system’s PATH.
Configuration Options#
YAML#
# steps configuration
steps:
...
enrichers:
- screenshot_enricher
...
# module configuration
...
screenshot_enricher:
width: 1280
height: 1024
timeout: 60
sleep_before_screenshot: 4
http_proxy: ''
save_to_pdf: false
print_options: {}
Command Line:#
Option |
Description |
Default |
Type |
|---|---|---|---|
|
Optional. width of the screenshots |
1280 |
int |
|
Optional. height of the screenshots |
1024 |
int |
|
Optional. timeout for taking the screenshot |
60 |
int |
|
Optional. seconds to wait for the pages to load before taking screenshot |
4 |
int |
|
Optional. http proxy to use for the webdriver, eg http://proxy-user:password@proxy-ip:port |
string |
|
|
Optional. save the page as pdf along with the screenshot. PDF saving options can be adjusted with the ‘print_options’ parameter |
False |
bool |
|
Optional. options to pass to the pdf printer, in JSON format. See https://www.selenium.dev/documentation/webdriver/interactions/print_page/ for more information |
{} |
json_loader |