dev_stories

A Spring Boot application for scraping and converting HTML data into an Excel spreadsheet.

This project demonstrates a complete pipeline of working with web data using Java and Spring Boot. It fetches raw HTML from a given URL, extracts relevant information, maps it to structured Java objects (POJOs), and writes the results into an .xls Excel file.

⚙️ Tech Stack

Spring Boot – main application framework
Jsoup – HTML parsing
Apache POI – Excel file generation
Spring Configuration – flexible setup via application.properties

🚀 Features

Fetch HTML content Makes a GET request to the target URL and retrieves the HTML response.
Parse HTML with Jsoup Uses Jsoup to create a Document object and navigate the DOM.
Map to POJO objects Extracted data is converted into Java classes for easy manipulation.
Export to Excel (XLS) Generates a formatted .xls spreadsheet using Apache POI.
Configurable via properties file Keywords and target URL can be defined in application.properties.

🛠️ How to Run

Clone the repository:

git clone https://github.com/maltsev-dev/html_scraper_template
cd html_scraper_template

Configure the app in src/main/resources/application.properties:

target.url=https://example.com
scraper.keywords=Item,Price,Details

Run the application:

./mvnw spring-boot:run

The output Excel file (output.xls) will be generated in the project root.

✅ Benefits

Fully based on the Java ecosystem
Requires no browser automation or WebDriver
Easily customizable for different scraping needs
Simple to set up and extend

📄 License

This project is licensed under the MIT License.

📄 HTML Scraper

⚙️ Tech Stack

🚀 Features

🛠️ How to Run

✅ Benefits

📄 License