wacz_extractor_enricher.wacz_extractor_enricher#
Module Contents#
- class wacz_extractor_enricher.wacz_extractor_enricher.WaczExtractorEnricher#
Bases:
auto_archiver.core.Enricher,auto_archiver.core.ExtractorUses webrecorder/browsertrix-crawler to generate a .WACZ archive of the URL If used with [profiles](webrecorder/browsertrix-crawler) it can become quite powerful for archiving private content. When used as an archiver it will extract the media from the .WACZ archive so it can be enriched.
- setup() None#
- cleanup() None#
Called when extractors are done, or upon errors, cleanup any resources
- download(item: auto_archiver.core.Metadata) auto_archiver.core.Metadata#
Downloads the media from the given URL and returns a Metadata object with the downloaded media.
If the URL is not supported or the download fails, this method should return False.
- enrich(to_enrich: auto_archiver.core.Metadata) bool#
Enriches a Metadata object with additional information or context.
Takes the metadata object to enrich as an argument and modifies it in place, returning None.
- extract_media_from_wacz(to_enrich: auto_archiver.core.Metadata, wacz_filename: str) None#
Receives a .wacz archive, and extracts all relevant media from it, adding them to to_enrich.