antibot_extractor_enricher.dropin
=================================

.. py:module:: antibot_extractor_enricher.dropin




Module Contents
---------------

.. py:class:: Dropin(sb: seleniumbase.SB, extractor: auto_archiver.core.Extractor)

   A class to handle drop-in functionality for the antibot extractor enricher module.
   This class is designed to be a base class for drop-ins that can handle specific websites.


   .. py:method:: documentation() -> Mapping[str, str]
      :staticmethod:


      Each Dropin should auto-document itself with this method.
      Return dictionary can include:
      - 'name': A string representing the name of the dropin.
      - 'description': A string describing the functionality of the dropin.
      - 'site': A string representing the site this dropin is for.
      - 'authentication': A dictionary with authentication example for the site.




   .. py:attribute:: sb
      :type:  seleniumbase.SB


   .. py:attribute:: extractor
      :type:  auto_archiver.core.Extractor


   .. py:method:: suitable(url: str) -> bool
      :staticmethod:

      :abstractmethod:


      Check if the URL is suitable for processing with this dropin.
      :param url: The URL to check.
      :return: True if the URL is suitable for processing, False otherwise.



   .. py:method:: sanitize_url(url: str) -> str
      :staticmethod:


      Used to clean URLs before processing them.



   .. py:method:: images_selectors() -> str
      :staticmethod:


      CSS selector to find images in the HTML page



   .. py:method:: video_selectors() -> str
      :staticmethod:


      CSS selector to find videos in the HTML page.



   .. py:method:: js_for_image_css_selectors() -> str

      A configurable JS script that receives a css selector from the dropin itself and returns an array of Image elements according to the selection.

      You can overwrite this instead of `images_selector` for more control over scraped images.



   .. py:method:: js_for_video_css_selectors() -> str

      A configurable JS script that receives a css selector from the dropin itself and returns an array of Video elements according to the selection.

      You can overwrite this instead of `video_selector` for more control over scraped videos.



   .. py:method:: open_page(url) -> bool
      :abstractmethod:


      Make sure the page is opened, even if it requires authentication, captcha solving, etc.
      :param url: The URL to open.
      :return: True if success, False otherwise.



   .. py:method:: add_extra_media(to_enrich: auto_archiver.core.Metadata) -> tuple[int, int]

      Extract image and/or video data from the currently open post with SeleniumBase. Media is added to the `to_enrich` Metadata object.
      :return: A tuple (number of Images added, number of Videos added).



