The Codeforces Scraper is an automated data extraction tool designed to scrape competitive programming problem descriptions directly from the Codeforces platform. Once the raw HTML or text is extracted, the application leverages the Google Gemini API to intelligently parse the messy, unstructured data and convert it into a clean, highly structured format (such as JSON or Markdown). This makes it incredibly easy to index, store, or migrate competitive programming problems into custom databases or study platforms.
Reliably fetches problem statements, input/output constraints, and test cases directly from Codeforces URLs.
Utilizes the Gemini API to understand the context of scraped text, separating description from constraints and examples.
Converts raw, unstructured webpage data into clean, machine-readable formats for seamless downstream integration.
A web crawler navigates to the target Codeforces problem URL and extracts the raw DOM elements containing the problem statement.
The raw text is passed to the Gemini API using a carefully engineered prompt to identify and categorize specific fields.
The response from Gemini is validated and serialized into a structured format, ready to be saved or pushed to a database.
Implement a queue system to scrape and structure all problems from a specific contest in one go.
Expand the AI prompt to automatically generate boilerplate test-runner code for instant local testing.
Add formatting options to beautifully render structured JSON data into clean Markdown files or PDF documents.