Logging in to sites#
This how-to guide shows you how you can use various authentication methods to allow you to login to a site you are trying to archive. This is useful for websites that require a user to be logged in to browse them, or for sites that restrict bots.
In this How-To, we will authenticate on use Twitter/X.com using cookies, and on XXXX using username/password.
Authenticating on XXXX site with username/password#
Note
This section is still under construction 🚧
Proof of Origin Tokens#
YouTube uses Proof of Origin Tokens (POT) as part of its bot detection system to verify that requests originate from valid clients. If a token is missing or invalid, some videos may return errors like “Sign in to confirm you’re not a bot.”
yt-dlp provides a detailed guide to POTs.
How Auto Archiver Uses POT#
This feature is enabled for the Generic Archiver via two yt-dlp plugins:
Client-side plugin: yt-dlp-get-pot
Detects when a token is required and requests one from a provider.Provider plugin: bgutil-ytdlp-pot-provider
Includes both a Python plugin and a Node.js server or script to generate the token.
These are installed in our Poetry environment.
Integration Methods#
Docker (Recommended):
When running the Auto Archiver using the Docker image, we use the Node.js token generation script. This is to avoid managing a separate server process, and is handled automatically inside the Docker container when needed.
This is already included in the Docker image, however if you need to disable this you can set the config option bguils_po_token_method under the generic_extractor section of your orchestration.yaml config file to “disabled”.
generic_extractor:
bguils_po_token_method: "disabled"
PyPi/ Local:
When using the Auto Archiver PyPI package, or running locally, you will need additional system requirements to run the token generation script, namely either Docker, or Node.js and Yarn.
See the bgutil-ytdlp-pot-provider documentation for more details.
⚠️WARNING⚠️: This will add the server scripts to the home directory of wherever this is running.
You can set the config option
bguils_po_token_methodunder thegeneric_extractorsection of yourorchestration.yamlconfig file to “script” to enable the token generation script process locally.Alternatively you can run the bgutil-ytdlp-pot-provider server separately using their Docker image or Node.js server.
Notes#
The token generation script is only triggered when needed by yt-dlp, so it should have no effect unless YouTube requests a POT.
If you’re running the Auto Archiver in Docker, this is set up automatically.
If you’re running locally, you’ll need to run the setup script manually or enable the feature in your config.
You can set up both the server and the script, and the plugin will fallback on each other if needed. This is recommended for robustness!
Configurations:#
Configurations Summary#
Option |
Behavior |
Docker Default? |
|---|---|---|
|
Docker: Automatically downloads and uses the token generation script. Local: Does nothing; assumes a separate server is running externally. |
✅ Yes |
|
Explicitly downloads and uses the token generation script, even locally. |
❌ No |
|
Disables token generation completely. |
❌ No |
Example configuration:
generic_extractor:
# ...
bguils_po_token_method: "script"
# For debugging add the verbose flag here:
ytdlp_args: "--no-abort-on-error --abort-on-error --verbose"
Advanced Configuration:
If you change the default port of the bgutil-ytdlp-pot-provider server, you can pass the updated values using our extractor_args option for the gereric extractor.
generic_extractor:
ytdlp_args: "--no-abort-on-error --abort-on-error --verbose"
ytdlp_update_interval: 5
bguils_po_token_method: "script"
extractor_args:
youtube:
getpot_bgutil_baseurl: "http://127.0.0.1:8080"
player_client: web,tv
For more details on this for bgutils see here
Checking the logs#
To verify that the POT process working, look for the following lines in your log after adding the config option:
[GetPOT] BgUtilScript: Generating POT via script: /Users/you/bgutil-ytdlp-pot-provider/server/build/generate_once.js
[debug] [GetPOT] BgUtilScript: Executing command to get POT via script: /Users/you/.nvm/versions/node/v20.18.0/bin/node /Users/you/bgutil-ytdlp-pot-provider/server/build/generate_once.js -v ymCMy8OflKM
[debug] [GetPOT] BgUtilScript: stdout:
{"poToken":"MlMxojNFhEJvUzGeHEkVRSK_luXtwcDnwSNIOgaUutqB7t99nmlNvtWgYayboopG6ZopZgmQ-6PJCWEMHv89MIiFGGlJRY25Fkwzxmia_8uYgf5AWf==","generatedAt":"2025-03-26T10:45:26.156Z","visitIdentifier":"ymCMy8OflKM"}
[debug] [GetPOT] Fetching gvs PO Token for tv client
If it can’t find the script or something, you’ll see something like this:
[debug] [GetPOT] Fetching player PO Token for tv client
WARNING: [GetPOT] BgUtilScript: Script path doesn't exist: /Users/you/bgutil-ytdlp-pot-provider/server/build/generate_once.js. Please make sure the script has been transpiled correctly.
WARNING: [GetPOT] BgUtilHTTP: Error reaching GET http://127.0.0.1:4416/ping (caused by TransportError). Please make sure that the server is reachable at http://127.0.0.1:4416.
[debug] [GetPOT] No player PO Token provider available for tv client
In this case check that the script has been transpiled correctly and is available at the path specified in the log, or that the server is running and reachable.
