wacz_extractor_enricher.wacz_extractor_enricher#

Module Contents#

class wacz_extractor_enricher.wacz_extractor_enricher.WaczExtractorEnricher#

Bases: auto_archiver.core.Enricher, auto_archiver.core.Extractor

Uses webrecorder/browsertrix-crawler to generate a .WACZ archive of the URL If used with [profiles](webrecorder/browsertrix-crawler) it can become quite powerful for archiving private content. When used as an archiver it will extract the media from the .WACZ archive so it can be enriched.

setup() → None#

cleanup() → None#: Called when extractors are done, or upon errors, cleanup any resources

download(item: auto_archiver.core.Metadata) → auto_archiver.core.Metadata#

Downloads the media from the given URL and returns a Metadata object with the downloaded media.

If the URL is not supported or the download fails, this method should return False.

enrich(to_enrich: auto_archiver.core.Metadata) → bool#

Enriches a Metadata object with additional information or context.

Takes the metadata object to enrich as an argument and modifies it in place, returning None.

extract_media_from_wacz(to_enrich: auto_archiver.core.Metadata, wacz_filename: str) → None#: Receives a .wacz archive, and extracts all relevant media from it, adding them to to_enrich.