Module Documentation#

These pages describe the core modules that come with Auto Archiver and provide the main functionality for archiving websites on the internet. There are five core module types:

  1. Feeders - these ‘feed’ information (the URLs) from various sources to the Auto Archiver for processing

  2. Extractors - these ‘extract’ the page data for a given URL that is fed in by a feeder

  3. Enrichers - these ‘enrich’ the data extracted in the previous step with additional information

  4. Storage - these ‘store’ the data in a persistent location (on disk, Google Drive etc.)

  5. Databases - these ‘store’ the status of the entire archiving process in a log file or database.

Feeder Modules#

Command Line Feeder

Atlos Feeder Database Storage

CSV Feeder

Google Sheets Feeder Database

Extractor Modules#

Antibot Extractor/Enricher

Generic Extractor

Telegram Extractor

Instagram API Extractor

Instagram Extractor

Instagram Telegram Bot Extractor

Telethon Extractor

Twitter API Extractor

WACZ Enricher (and Extractor)

Wayback Machine Enricher (and Extractor)

Enricher Modules#

Antibot Extractor/Enricher

Ghost Archive Enricher

Hash Enricher

Archive Metadata Enricher

PDQ Hash Enricher

SSL Certificate Enricher

Thumbnail Enricher

JSON Enricher

Media Metadata Enricher

OpenTimestamps Enricher

Timestamping Enricher

WACZ Enricher (and Extractor)

Wayback Machine Enricher (and Extractor)

Whisper Enricher

Database Modules#

Console Database

CSV Database

Auto Archiver API Database

Atlos Feeder Database Storage

Google Sheets Feeder Database

Storage Modules#

Local Storage

Atlos Feeder Database Storage

Google Drive Storage

S3 Storage

Formatter Modules#

HTML Formatter

Mute Formatter