core.orchestrator#

Orchestrates all archiving steps, including feeding items, archiving them with specific archivers, enrichment, storage, formatting, database operations and clean up.

Module Contents#

core.orchestrator.DEFAULT_CONFIG_FILE = 'orchestration.yaml'#
class core.orchestrator.JsonParseAction(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)#

Bases: argparse.Action

Information about how to convert command line strings to Python objects.

Action objects are used by an ArgumentParser to represent the information needed to parse a single argument from one or more strings from the command line. The keyword arguments to the Action constructor are also all attributes of Action instances.

Keyword Arguments:
  • which (- option_strings -- A list of command-line option strings) – should be associated with this action.

  • object (- dest -- The name of the attribute to hold the created)

  • be (- nargs -- The number of command-line arguments that should) –

    consumed. By default, one argument will be consumed and a single value will be produced. Other values include:

    • N (an integer) consumes N arguments (and produces a list)

    • ’?’ consumes zero or one arguments

    • ’*’ consumes zero or more arguments (and produces a list)

    • ’+’ consumes one or more arguments (and produces a list)

    Note that the difference between the default and nargs=1 is that with the default, a single value will be produced, while with nargs=1, a list containing a single value will be produced.

  • the (- metavar -- The name to be used for the option's argument with) – option uses an action that takes no values.

  • specified. (- default -- The value to be produced if the option is not)

  • and (- type -- A callable that accepts a single string argument,) – returns the converted value. The standard Python types str, int, float, and complex are useful examples of such callables. If None, str is used.

  • None, (- choices -- A container of values that should be allowed. If not) – after a command-line argument has been converted to the appropriate type, an exception will be raised if it is not a member of this collection.

  • the – command line. This is only meaningful for optional command-line arguments.

  • argument. (- help -- The help string describing the)

  • the – help string. If None, the ‘dest’ value will be used as the name.

class core.orchestrator.AuthenticationJsonParseAction(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)#

Bases: JsonParseAction

Information about how to convert command line strings to Python objects.

Action objects are used by an ArgumentParser to represent the information needed to parse a single argument from one or more strings from the command line. The keyword arguments to the Action constructor are also all attributes of Action instances.

Keyword Arguments:
  • which (- option_strings -- A list of command-line option strings) – should be associated with this action.

  • object (- dest -- The name of the attribute to hold the created)

  • be (- nargs -- The number of command-line arguments that should) –

    consumed. By default, one argument will be consumed and a single value will be produced. Other values include:

    • N (an integer) consumes N arguments (and produces a list)

    • ’?’ consumes zero or one arguments

    • ’*’ consumes zero or more arguments (and produces a list)

    • ’+’ consumes one or more arguments (and produces a list)

    Note that the difference between the default and nargs=1 is that with the default, a single value will be produced, while with nargs=1, a list containing a single value will be produced.

  • the (- metavar -- The name to be used for the option's argument with) – option uses an action that takes no values.

  • specified. (- default -- The value to be produced if the option is not)

  • and (- type -- A callable that accepts a single string argument,) – returns the converted value. The standard Python types str, int, float, and complex are useful examples of such callables. If None, str is used.

  • None, (- choices -- A container of values that should be allowed. If not) – after a command-line argument has been converted to the appropriate type, an exception will be raised if it is not a member of this collection.

  • the – command line. This is only meaningful for optional command-line arguments.

  • argument. (- help -- The help string describing the)

  • the – help string. If None, the ‘dest’ value will be used as the name.

class core.orchestrator.UniqueAppendAction(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)#

Bases: argparse.Action

Information about how to convert command line strings to Python objects.

Action objects are used by an ArgumentParser to represent the information needed to parse a single argument from one or more strings from the command line. The keyword arguments to the Action constructor are also all attributes of Action instances.

Keyword Arguments:
  • which (- option_strings -- A list of command-line option strings) – should be associated with this action.

  • object (- dest -- The name of the attribute to hold the created)

  • be (- nargs -- The number of command-line arguments that should) –

    consumed. By default, one argument will be consumed and a single value will be produced. Other values include:

    • N (an integer) consumes N arguments (and produces a list)

    • ’?’ consumes zero or one arguments

    • ’*’ consumes zero or more arguments (and produces a list)

    • ’+’ consumes one or more arguments (and produces a list)

    Note that the difference between the default and nargs=1 is that with the default, a single value will be produced, while with nargs=1, a list containing a single value will be produced.

  • the (- metavar -- The name to be used for the option's argument with) – option uses an action that takes no values.

  • specified. (- default -- The value to be produced if the option is not)

  • and (- type -- A callable that accepts a single string argument,) – returns the converted value. The standard Python types str, int, float, and complex are useful examples of such callables. If None, str is used.

  • None, (- choices -- A container of values that should be allowed. If not) – after a command-line argument has been converted to the appropriate type, an exception will be raised if it is not a member of this collection.

  • the – command line. This is only meaningful for optional command-line arguments.

  • argument. (- help -- The help string describing the)

  • the – help string. If None, the ‘dest’ value will be used as the name.

class core.orchestrator.ArchivingOrchestrator#
feeders: List[Type[core.Feeder]]#
extractors: List[Type[core.Extractor]]#
enrichers: List[Type[core.Enricher]]#
databases: List[Type[core.Database]]#
storages: List[Type[core.Storage]]#
formatters: List[Type[core.Formatter]]#
setup_basic_parser()#
setup_complete_parser(basic_config: dict, yaml_config: dict, unused_args: list[str]) None#
add_additional_args(parser: argparse.ArgumentParser = None)#
add_module_args(modules: list[core.module.LazyBaseModule] = None, parser: argparse.ArgumentParser = None) None#
show_help(basic_config: dict)#
setup_logging()#
install_modules(modules_by_type)#

Traverses all modules in ‘steps’ and loads them into the orchestrator, storing them in the orchestrator’s attributes (self.feeders, self.extractors etc.). If no modules of a certain type are loaded, the program will exit with an error message.

load_config(config_file: str) dict#
run(args: list) None#
cleanup() None#
feed() Generator[core.metadata.Metadata]#
feed_item(item: core.metadata.Metadata) core.metadata.Metadata#
Takes one item (URL) to archive and calls self.archive, additionally:
  • catches keyboard interruptions to do a clean exit

  • catches any unexpected error, logs it, and does a clean exit

archive(result: core.metadata.Metadata) core.metadata.Metadata | None#

Runs the archiving process for a single URL 1. Each archiver can sanitize its own URLs 2. Check for cached results in Databases, and signal start to the databases 3. Call Archivers until one succeeds 4. Call Enrichers 5. Store all downloaded/generated media 6. Call selected Formatter and store formatted if needed

assert_valid_url(url: str) bool#

Blocks localhost, private, reserved, and link-local IPs and all non-http/https schemes.

property all_modules: List[Type[core.module.BaseModule]]#