Creating Your Own Modules#
Modules are what’s used to extend Auto Archiver to process different websites or media, and/or transform the data in a way that suits your needs. In most cases, the Core Modules should be sufficient for every day use, but the most common use-cases for making your own Modules include:
Extracting data from a website which doesn’t work with the current core extractors.
Enriching or altering the data before saving with additional information that the core enrichers do not offer.
Storing your data in a different format/location from what the core storage providers offer.
Setting up the folder structure#
First, decide what type of module you wish to create. Check the types of modules on the Module Documentation page to decide what type you need. (Note: a module can be more than one type, more on that below)
Create a new python package (a folder) with the name of your module (in this tutorial, we’ll call it
awesome_extractor).Create the
__manifest__.pyand an theawesome_extractor.pyfiles in this folder.
When done, you should have a module structure as follows:
.
├── awesome_extractor
│ ├── __manifest__.py
│ └── awesome_extractor.py
Check out the core modules in the Auto Archiver repository for examples of the folder structure for real-world modules.
Populating the Manifest File#
The manifest file is where you define the core information of your module. It is a python dict containing important information, here’s an example file:
{
# Display Name of your module
"name": "Example Module",
# The author of your module (optional)
"author": "John Doe",
# Optional version number, for your own versioning purposes
"version": 2.0,
# The type of the module, must be one (or more) of the built in module types
"type": ["extractor", "feeder", "formatter", "storage", "enricher", "database"],
# a boolean indicating whether or not a module requires additional user setup before it can be used
# for example: adding API keys, installing additional software etc.
"requires_setup": False,
# a dictionary of dependencies for this module, that must be installed before the module is loaded.
# Can be python dependencies (external packages, or other auto-archiver modules), or you can
# provide external bin dependencies (e.g. ffmpeg, docker etc.)
"dependencies": {
"python": ["loguru"],
"bin": ["bash"],
},
# configurations that this module takes. These are argparse-compliant dicationaries, that are
# used to create command line arguments when the programme is run.
# The full name of the config option will become: `module_name.config_name`
"configs": {
"csv_file": {"default": "db.csv", "help": "CSV file name"},
"required_field": {"required": True, "help": "required field in the CSV file"},
},
# A description of the module, used for documentation
"description": "This is an example module",
}
Creating the Python Code#
The next step is to create your module code. First, create a class which should subclass the base module types from auto_archiver.core, here’s an example class for the awesome_extractor module which is an extractor:
from auto_archiver.core import Extractor, Metadata
def AwesomeExtractor(Extractor):
def download(self, item: Metadata) -> Metadata | False:
url = item.get_url()
# download the content and create the metadata object
metadata = ...
return metadata