core.extractor#
The extractor module defines the base functionality for implementing extractors in the media archiving framework. This class provides common utility methods and a standard interface for extractors.
Factory method to initialize an extractor instance based on its name.
Module Contents#
- class core.extractor.Extractor#
Bases:
auto_archiver.core.BaseModuleBase class for implementing extractors in the media archiving framework. Subclasses must implement the download method to define platform-specific behavior.
- valid_url: re.Pattern = None#
- cleanup() None#
Called when extractors are done, or upon errors, cleanup any resources
- sanitize_url(url: str) str#
Used to clean unnecessary URL parameters OR unfurl redirect links
- match_link(url: str) re.Match#
Returns a match object if the given URL matches the valid_url pattern or False/None if not.
Normally used in the suitable method to check if the URL is supported by this extractor.
- suitable(url: str) bool#
Returns True if this extractor can handle the given URL
Should be overridden by subclasses
- download_from_url(url: str, to_filename: str = None, verbose=True, try_best_quality=False) str#
downloads a URL to provided filename, or inferred from URL, returns local filename Warning: if try_best_quality is True, it will return a tuple of (filename, best_quality_url) if the download was successful.
- abstract download(item: auto_archiver.core.Metadata) auto_archiver.core.Metadata | False#
Downloads the media from the given URL and returns a Metadata object with the downloaded media.
If the URL is not supported or the download fails, this method should return False.