By the end of this tutorial, you will have created a web scraper to scrape the infinitely scrolling pages on Reddit (Growth Hacking) for content insights.
• Simple Scraper
• Reddit (Growth Hacking)
Table Of Contents
1. Initial Scraper Setup
2. Running Your Scraper
3. Saving Your Crawler
4. Running Your Crawler
Go to the Growth Hacking Subreddit (reddit.com/r/growthhacking).
Open Simple Scraper and click the plus (+) sign.
First, you'll want to scrape the titles: select a title. Everything which gets highlighted is what'll get extracted. Name this data, 'Title'. Then, click the tick to set it for when you run the scraper.
Second, you'll want to scrape the votes. Again, click the plus (+) sign. Select the votes of a post. Everything which gets highlighted is what'll get extracted. Name this data, 'Votes'. And, click the tick to set it for when you run the scraper.
Now, click the infinite loop button and scroll down the page. Scrolling the page ensures that the web scraper understands it's an infinitely scrolling page.
To run your scraper, click 'View Results'.
Once the web scraper has run, Simple Scraper will return the selected data. It will allow you to view that data in a table or as a JSON file. And, you will have the option of downloading the data as either a CSV file or JSON.
To save your web crawler, click 'Save Recipe'.
You'll have to confirm the settings for your web crawler when saving it. The settings that got entered for this project are:
Once you've entered the settings, click 'Create Recipe'.
Click on the recipe you saved under 'My Recipes'.
To run your web crawler, click 'Run Recipe'.
Once the web crawler has run, Simple Scraper will return the selected data. You can view the output of your web crawler on the 'Results' page.
You'll notice that Simple Scraper has crawled the page four times and returned the selected data. You'll get given the option to view that data in a table or as a JSON file. And, you'll have the option of downloading the data too.