Lightfeed Extractor Robust Web Data Extractor Using LLMs and Browser Automation Lightfeed Extractor is a Typescript library built for robust web data extraction using LLMs and Playwright. Use natural language prompts to navigate web pages and extract structured data. Get complete, accurate results with great token efficiency — critical for production data pipelines. 🤖 Browser Automation in Stealth Mode - Launch Playwright browsers locally, in serverless clouds, or connect to a remote browser server. Avoid detection with built-in anti-bot patches and proxy configuration for reliable web scraping. 🧭 AI Browser Navigation - Pair with @lightfeed/browser-agent to navigate pages using natural language commands before extracting structured data. 🧹 LLM-ready Markdown - Convert HTML to LLM-ready markdown, with options to extract only main content and clean URLs by removing tracking parameters. ⚡️ LLM Extraction - Use LLMs in JSON mode to extract structured data according to input Zod schema. Token usage limit and tracking included. 🛠️ JSON Recovery - Sanitize and recover failed JSON output. This makes complex schema extraction much more robust, especially with deeply nested objects and arrays. 🔗 URL Validation - Handle relative URLs, remove invalid ones, and repair markdown-escaped links. TipBuilding retail competitor intelligence at scale? Go to app.lightfeed.ai - our full platform for tracking competitor pricing, sales, promotions, and SEO across 1,000+ retail chains - get started for free. For generic web data pipelines with AI enrichment and workflow automation, check out lightfeed.ai. Install the extractor: npm install @lightfeed/extractor Then install the LLM provider you want to use: # OpenAI npm install @langchain/openai # Google Gemini npm install @langchain/google-genai # Anthropic npm install @langchain/anthropic # Ollama (local models) npm install @langchain/ollama @langchain/core will be installed automatically as a peer dependency. E-commerce Product Extraction...
First seen: 2026-03-26 05:00
Last seen: 2026-03-26 14:11