How to Use rs-trafilatura with Firecrawl
Firecrawl is an API service for scraping web pages. It handles JavaScript rendering, anti-bot bypass, and rate limiting — you send it a URL, it gives you back the page content. By default, Firecraw...

Source: DEV Community
Firecrawl is an API service for scraping web pages. It handles JavaScript rendering, anti-bot bypass, and rate limiting — you send it a URL, it gives you back the page content. By default, Firecrawl returns Markdown. But if you request the raw HTML, you can run rs-trafilatura on it for page-type-aware extraction with quality scoring. This is useful when you need structured metadata (title, author, date, page type) or when you want to know how confident the extraction is. Install pip install rs-trafilatura firecrawl You also need a Firecrawl API key from firecrawl.dev. Basic Usage from firecrawl import FirecrawlApp from rs_trafilatura.firecrawl import extract_firecrawl_result app = FirecrawlApp(api_key="fc-your-api-key") # Request HTML format (required for rs-trafilatura) result = app.scrape("https://example.com/blog/post", formats=["html"]) # Extract with rs-trafilatura extracted = extract_firecrawl_result(result) print(f"Title: {extracted.title}") print(f"Author: {extracted.author}")