← Projects

MLH-Web-Scraper

04/2024 - 04/2024

Tech Stack:

https://avatars.githubusercontent.com/u/7894478?v=4
https://cdn.jsdelivr.net/gh/devicons/devicon@latest/icons/go/go-original.svg

Description: A scraper scrapes all the MLH hackathon event


This is my first simple mini project written in Go for web scraping. It scraps all the hackathon event from the official MLH event page(link) and outputs them to a csv file. I added a lot of comments to help me understand what is going on behind the scene because I am still very new to Go. For my first few attempts, I had a hard time scraping the website because the MLH website has enabled some mechniams for anti-scraping, therefore I kept receiving Forbidden response status code (403). Eventually I decided to use a free proxy server to bypass it and to avoid direct request to the MLH website. In other words, I sent a request to the proxy server with my requested website URL, and proxy server sent a request to the MLH website, and proxy server returned the HTML from MLH then sent it back to me. I learned how I/O, url query construction, Colly (the web scraping library) and other basic data structures works. Overall it is a great learning experience!