wacz_enricher
=============

.. py:module:: wacz_enricher


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/wacz_enricher/wacz_enricher/index


Package Contents
----------------

.. py:class:: WaczExtractorEnricher

   Bases: :py:obj:`auto_archiver.core.Enricher`, :py:obj:`auto_archiver.core.Extractor`


   Uses https://github.com/webrecorder/browsertrix-crawler to generate a .WACZ archive of the URL
   If used with [profiles](https://github.com/webrecorder/browsertrix-crawler#creating-and-using-browser-profiles)
   it can become quite powerful for archiving private content.
   When used as an archiver it will extract the media from the .WACZ archive so it can be enriched.


   .. py:method:: setup() -> None


   .. py:method:: cleanup() -> None

      Called when extractors are done, or upon errors, cleanup any resources


   .. py:method:: download(item: auto_archiver.core.Metadata) -> auto_archiver.core.Metadata

      Downloads the media from the given URL and returns a Metadata object with the downloaded media.

      If the URL is not supported or the download fails, this method should return False.


   .. py:method:: enrich(to_enrich: auto_archiver.core.Metadata) -> bool

      Enriches a Metadata object with additional information or context.

      Takes the metadata object to enrich as an argument and modifies it in place, returning None.


   .. py:method:: extract_media_from_wacz(to_enrich: auto_archiver.core.Metadata, wacz_filename: str) -> None

      Receives a .wacz archive, and extracts all relevant media from it, adding them to to_enrich.