core.extractor
==============

.. py:module:: core.extractor

.. autoapi-nested-parse::

   The `extractor` module defines the base functionality for implementing extractors in the media archiving framework.
   This class provides common utility methods and a standard interface for extractors.

   Factory method to initialize an extractor instance based on its name.






Module Contents
---------------

.. py:class:: Extractor

   Bases: :py:obj:`auto_archiver.core.BaseModule`


   Base class for implementing extractors in the media archiving framework.
   Subclasses must implement the `download` method to define platform-specific behavior.


   .. py:attribute:: valid_url
      :type:  re.Pattern
      :value: None



   .. py:method:: cleanup() -> None

      Called when extractors are done, or upon errors, cleanup any resources



   .. py:method:: sanitize_url(url: str) -> str

      Used to clean unnecessary URL parameters OR unfurl redirect links



   .. py:method:: match_link(url: str) -> re.Match

      Returns a match object if the given URL matches the valid_url pattern or False/None if not.

      Normally used in the `suitable` method to check if the URL is supported by this extractor.




   .. py:method:: suitable(url: str) -> bool

      Returns True if this extractor can handle the given URL

      Should be overridden by subclasses




   .. py:method:: download_from_url(url: str, to_filename: str = None, verbose=True, try_best_quality=False) -> str

      downloads a URL to provided filename, or inferred from URL, returns local filename
      Warning: if try_best_quality is True, it will return a tuple of (filename, best_quality_url) if the download was successful.



   .. py:method:: download(item: auto_archiver.core.Metadata) -> auto_archiver.core.Metadata | False
      :abstractmethod:


      Downloads the media from the given URL and returns a Metadata object with the downloaded media.

      If the URL is not supported or the download fails, this method should return False.




