Google Sheets Database

Google Sheets Database#

Module type

database

GsheetsDatabase: Handles integration with Google Sheets for tracking archival tasks.

Features#

  • Updates a Google Sheet with the status of the archived URLs, including in progress, success or failure, and method used.

  • Saves metadata such as title, text, timestamp, hashes, screenshots, and media URLs to designated columns.

  • Formats media-specific metadata, such as thumbnails and PDQ hashes for the sheet.

  • Skips redundant updates for empty or invalid data fields.

Notes#

  • Currently works only with metadata provided by GsheetFeeder.

  • Requires configuration of a linked Google Sheet and appropriate API credentials.

Configuration Options#

YAML#

gsheet_db:
  allow_worksheets: !!set {}
  block_worksheets: !!set {}
  use_sheet_names_in_stored_paths: true

Command Line:#

Option

Description

Default

Type

gsheet_db.allow_worksheets

Optional. (CSV) only worksheets whose name is included in allow are included (overrides worksheet_block), leave empty so all are allowed

set()

string

gsheet_db.block_worksheets

Optional. (CSV) explicitly block some worksheets from being processed

set()

string

gsheet_db.use_sheet_names_in_stored_paths

Optional. if True the stored files path will include ‘workbook_name/worksheet_name/…’

True

bool

API Reference