
# Google Sheets Feeder
```{admonition} Module type

<span style='color: #FFA500'>[feeder](/core_modules.md#feeder-modules)</a></span>
```

GsheetsFeeder 
A Google Sheets-based feeder for the Auto Archiver.

This reads data from Google Sheets and filters rows based on user-defined rules.
The filtered rows are processed into `Metadata` objects.

### Features
- Validates the sheet structure and filters rows based on input configurations.
- Processes only worksheets allowed by the `allow_worksheets` and `block_worksheets` configurations.
- Ensures only rows with valid URLs and unprocessed statuses are included for archival.
- Supports organizing stored files into folder paths based on sheet and worksheet names.

### Notes
- Requires a Google Service Account JSON file for authentication. Suggested location is `secrets/gsheets_service_account.json`.
- Create the sheet using the template provided in the docs.


## Configuration Options

### YAML
```{code} yaml
gsheet_feeder:
  sheet:
  sheet_id:
  header: 1
  service_account: secrets/service_account.json
  columns:
    url: link
    status: archive status
    folder: destination folder
    archive: archive location
    date: archive date
    thumbnail: thumbnail
    timestamp: upload timestamp
    title: upload title
    text: text content
    screenshot: screenshot
    hash: hash
    pdq_hash: perceptual hashes
    wacz: wacz
    replaywebpage: replaywebpage
  allow_worksheets: !!set {}
  block_worksheets: !!set {}
  use_sheet_names_in_stored_paths: true

```

### Command Line:
| Option | Description | Default | Type|
| --- | --- | --- | --- |
| `gsheet_feeder.sheet` | Optional. name of the sheet to archive | None | string |
| `gsheet_feeder.sheet_id` | Optional. (alternative to sheet name) the id of the sheet to archive | None | string |
| `gsheet_feeder.header` | Optional. index of the header row (starts at 1) | 1 | int |
| `gsheet_feeder.service_account` | Optional. service account JSON file path | secrets/service_account.json | string |
| `gsheet_feeder.columns` | Optional. names of columns in the google sheet (stringified JSON object) | {'url': 'link', 'status': 'archive status', 'folder': 'destination folder', 'archive': 'archive location', 'date': 'archive date', 'thumbnail': 'thumbnail', 'timestamp': 'upload timestamp', 'title': 'upload title', 'text': 'text content', 'screenshot': 'screenshot', 'hash': 'hash', 'pdq_hash': 'perceptual hashes', 'wacz': 'wacz', 'replaywebpage': 'replaywebpage'} | auto_archiver.utils.json_loader |
| `gsheet_feeder.allow_worksheets` | Optional. (CSV) only worksheets whose name is included in allow are included (overrides worksheet_block), leave empty so all are allowed | set() | string |
| `gsheet_feeder.block_worksheets` | Optional. (CSV) explicitly block some worksheets from being processed | set() | string |
| `gsheet_feeder.use_sheet_names_in_stored_paths` | Optional. if True the stored files path will include 'workbook_name/worksheet_name/...' | True | bool |

[API Reference](../../../autoapi/gsheet_feeder/index)
