Module Documentation#

These pages describe the core modules that come with auto-archiver and provide the main functionality for archiving websites on the internet. There are five core module types:

  1. Feeders - these ‘feed’ information (the URLs) from various sources to the auto-archiver for processing

  2. Extractors - these ‘extract’ the page data for a given URL that is fed in by a feeder

  3. Enrichers - these ‘enrich’ the data extracted in the previous step with additional information

  4. Storage - these ‘store’ the data in a persistent location (on disk, Google Drive etc.)

  5. Databases - these ‘store’ the status of the entire archiving process in a log file or database.

Feeder Modules#

Atlos Feeder

CSV Feeder

Google Sheets Feeder

Extractor Modules#

Generic Extractor

Telegram Extractor

Instagram API Extractor

Instagram Extractor

Instagram Telegram Bot Extractor

Telethon Extractor

Twitter API Extractor

VKontakte Extractor

WACZ Enricher

Wayback Machine Enricher

Enricher Modules#

Hash Enricher

Archive Metadata Enricher

PDQ Hash Enricher

SSL Certificate Enricher

Thumbnail Enricher

Media Metadata Enricher

Screenshot Enricher

Timestamping Enricher

WACZ Enricher

Wayback Machine Enricher

Whisper Enricher

Database Modules#

Console Database

CSV Database

Auto-Archiver API Database

Atlos Database

Google Sheets Database

Storage Modules#

Local Storage

Atlos Storage

Google Drive Storage

S3 Storage

Formatter Modules#

HTML Formatter

Mute Formatter