📄 HTML Scraper
A Spring Boot application for scraping and converting HTML data into an Excel spreadsheet.
This project demonstrates a complete pipeline of working with web data using Java and Spring Boot. It fetches raw HTML from a given URL, extracts relevant information, maps it to structured Java objects (POJOs), and writes the results into an .xls
Excel file.
⚙️ Tech Stack
- Spring Boot – main application framework
- Jsoup – HTML parsing
- Apache POI – Excel file generation
- Spring Configuration – flexible setup via
application.properties
🚀 Features
-
Fetch HTML content Makes a
GET
request to the target URL and retrieves the HTML response. -
Parse HTML with Jsoup Uses Jsoup to create a
Document
object and navigate the DOM. -
Map to POJO objects Extracted data is converted into Java classes for easy manipulation.
-
Export to Excel (XLS) Generates a formatted
.xls
spreadsheet using Apache POI. -
Configurable via properties file Keywords and target URL can be defined in
application.properties
.
🛠️ How to Run
- Clone the repository:
git clone https://github.com/maltsev-dev/html_scraper_template
cd html_scraper_template
- Configure the app in
src/main/resources/application.properties
:
target.url=https://example.com
scraper.keywords=Item,Price,Details
- Run the application:
./mvnw spring-boot:run
- The output Excel file (
output.xls
) will be generated in the project root.
✅ Benefits
- Fully based on the Java ecosystem
- Requires no browser automation or WebDriver
- Easily customizable for different scraping needs
- Simple to set up and extend
📄 License
This project is licensed under the MIT License.