Beginner’s Guide to Scraping Data (Uber Eats Example)

I’ve done a fair bit of web scraping thoughout my lifetime so I thought it’d be useful explaining how it works with a rather practical example. Let’s find the names of all restaurants on Uber Eats in Burlington.
(Full source code is also linked at the bottom!)

Python has been the leading web scraping language for the better part of a decade. I’ll be using the Python BeautifulSoup4 library, which you can read about here. It’s super lightweight, versatile, and makes quick work of webpages with limited used of JavaScript and animation.

To install it — assuming you have pip on your machine — just run:

pip install beautifulsoup4

Next, you’ll want to import it into your program using:

from bs4 import BeautifulSoup

You’ll also need to import the following at the top of your program:

from urllib.request import Request, urlopen
import json
import ssl

Now that we have our libraries sorted out, we can get into the fun stuff. For the sake of this exercise I’ll be referencing:

First, we’re going to need to retrieve the webpage contents. We can do this with the following lines of code:

url = "" # our url# send a request to the page, using the Mozilla 5.0 browser header
req = Request(url, headers={'User-Agent' : 'Mozilla/5.0'})
# open the page using our urlopen library
page = urlopen(req).read()
# use BeautifulSoup to parse the webpage
soup = BeautifulSoup(page, 'html.parser')

The above lines essentially tell our program where to look, then to request that webpage while mimicking that of a user using Mozilla 5.0, then opening such page, and finally parsing it using BeautifulSoup (which does most of the heavy lifting for us). Now we’re ready to grab our desired data.

Web scraping is sort of like solving a puzzle. Start with your end piece of data that you want to retrieve. In this case, we want to find all the restaurants in Burlington that are on Uber Eats. So, right click on the name of any restaurant and hit “Inspect”. Providing you’re working in some intuitive browser, the front-end source code should pop up, allowing you to see the tags of each element.

In this case, when I right click on “Taco Bell (777 Guelph Line)” and hit “Inspect”, it takes me to the line:

<h3 class="h3 c4 c5 ai">Taco Bell (777 Guelph Line)</h3>

This tells us that Uber Eats uses the <h3> tag to identify all of the names of the restaurant on the page. So, we’re going to need to find every <h3> tag on our page to get all the restaurant names. We can do this with the following snippet of code:

for x in soup.findAll(‘h3’): # loop through all restaurants

This is just a simple for loop in Python that iterates through the webpage content that our handy BeautifulSoup library parsed for us. The “findAll” method just creates a Python list of each element in our object “soup” that contains the <h3> tag. Inside the for loop we’re just printing the object x’s text field, which should result in the following output:

Taco Bell (777 Guelph Line)
McDonald's (Guelph Line-Burl)
Jack Astor's (3140 South Service Road)
Sotiris Greek Restaurant
Boston Pizza (Burlington South)
Burger King (3295 Fairview St.)
Hangry Piri Piri Chicken
Scaddabush (2429 Fairview St)

And we’re done!

Hopefully this provided a bit of help for anyone interested in web scraping. The libraries I used today are just one method of doing such a thing. A common alternative that’s better for more complex webpages using JavaScript is Selenium, which I’ll cover in more detail in another article. But, for the simple webpages with not too much going on, BeautifulSoup should be a solid start.

You can see a more in-depth example of this where I gather more restaurant data on my GitHub here.