requests
library to acquire HTMLBeautifulSoup
library to parse and search HTML.select()
ResultSet
class we can loop overTag
object which is the subsection of the HTML tree consiting of our matched tag and all its children.attrs
property.text
propertyExamples:
Let's look at the schema.
If a page with a recipe has this information, it will put an itemprop="attributeName"
on the tag enclosing that information.
<ul>
<li itemprop="recipeIngredient">3 garlic cloves minced</li>
<li itemprop="recipeIngredient">1 stalk celery, diced</li>
<li itemprop="recipeIngredient">1 carrot, diced</li>
<li itemprop="recipeIngredient">2 tbsp. of olive oil</li>
</ul>
This means we can call:
soup.select('[itemprop="recipeIngredient"]')
...to get all the ingredients on a recipe page
This simple recipe follows the schema.org standard.
Example Recipe Title
<h1 itemprop="name">Mom's World Famous Banana Bread</h1>
Example Recipe Ingredient List
<h2>Ingredients:</h2>
<ul>
<li itemprop="recipeIngredient">3 or 4 ripe bananas</li>
<li itemprop="recipeIngredient">1 egg</li>
<li itemprop="recipeIngredient">3/4 cup of sugar</li>
<li itemprop="recipeIngredient">1 1/2 cup of flour</li>
</ul>
Example Recipe Ingredient List
<div itemprop="recipeInstructions">
<span itemprop="step">Preheat the oven to 350 degrees.</span>
<span itemprop="step">Mix in the ingredients in a bowl.</span>
<span itemprop="step">Add the flour last.</span>
<span itemprop="step">Pour the mixture into a loaf pan and bake for one hour.</span>
</div>