core.extractor#

The extractor module defines the base functionality for implementing extractors in the media archiving framework. This class provides common utility methods and a standard interface for extractors.

Factory method to initialize an extractor instance based on its name.

Module Contents#

class core.extractor.Extractor#

Bases: auto_archiver.core.BaseModule

Base class for implementing extractors in the media archiving framework. Subclasses must implement the download method to define platform-specific behavior.

valid_url: re.Pattern = None#

cleanup() → None#: Called when extractors are done, or upon errors, cleanup any resources

sanitize_url(url: str) → str#: Used to clean unnecessary URL parameters OR unfurl redirect links

match_link(url: str) → re.Match#

Returns a match object if the given URL matches the valid_url pattern or False/None if not.

Normally used in the suitable method to check if the URL is supported by this extractor.

suitable(url: str) → bool#

Returns True if this extractor can handle the given URL

Should be overridden by subclasses

download_from_url(url: str, to_filename: str = None, verbose=True, try_best_quality=False) → str#: downloads a URL to provided filename, or inferred from URL, returns local filename Warning: if try_best_quality is True, it will return a tuple of (filename, best_quality_url) if the download was successful.

abstract download(item: auto_archiver.core.Metadata) → auto_archiver.core.Metadata | False#

Downloads the media from the given URL and returns a Metadata object with the downloaded media.

If the URL is not supported or the download fails, this method should return False.