Webeater

Quick out of the box web content extractor in Python aimed at AI agents.

Webeater - The web content extractor for AI agents

Webeater - A Python easy web content extractor.


Webeater Github Repo:
https://github.com/tiagrib/webeater

During the past few months I’ve been dedicating some time to explore some neurosymbolic approaches to AI agents that I could run locally (which I’ll save for a future post).
One of the tools I immediately needed was web search, as a simple method call straight from Python.
It was quite easy to search on google or on wikipedia, or even extract content from wikipedia directly using its API.
Turn out, however, that extracting live content from random websites was neither as simple nor efficient.

cut to the chase

After searching for Python libs that could just do it for me, I found myself having to code my own solution using Selenium and BeautifulSoup.
I’ve split out the code and cleaned it up slightly and have published it as Webeater.

WebEater is a web content extraction tool designed to fetch and process web pages easily and quickly from Python.
It is made for developers and researchers who need to extract structured data from web pages in order to create datasets or feed them directly to AI agents and LLMs.
The tool goes straight to the point, focusing on extracting text and structured data from web pages, while providing some additional configurations and hits for better effectiveness.

It’s still at an early stage, so will surely not cover every single edge cases or complex scenarios.
I’ll welcome contributions and feedback to help improve its capabilities.

Features

  • Fetches web pages and extracts text content into Markdown format.
  • Return clean, plain text or a JSON object optionally containing lists of images and links found on the page.
  • Handles JavaScript-heavy pages using Selenium and BeautifulSoup
Tiago Ribeiro
Tiago Ribeiro
AI Technology & Product Consulting

Eclectic scientist and engineer striving to breathe the Illusion of Life into autonomous characters