MechanicalSoup : Web Scraping

Mechanical Soup is a python library to interact with websites.

It can handle cookies, redirects but it can't interact with javascript.

You can think of it as a basic web browsing library.

If your web scraping need require complex interact with websites involving a lot of javascrpt interaction. You should probably use a more advanced tool like Selenium or Splash .

Installation

pip install MechanicalSoup

For more detail click here

Basic Example

import mechanicalsoup

user_agent_string = 'Mozilla/5.0 (X11; CrOS x86_64 8172.45.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.64 Safari/537.36'
browser = mechanicalsoup.StatefulBrowser(user_agent=user_agent_string,)

You can find examples of realistic user agent strings for different kind of browsers and different devices here

Once you have created a browser instance , you can open some webpage

browser.open('https://ebootcamp.dev')

Once webpage is opened, you can get HTML of the opened page

headings_list = browser.get_current_page()

From this current page if you only want to get all Level 3 headings

headings_list = browser.get_current_page().find_all('h3')

To get text of these headings

for heading in headings_list:
    print(heading.text)

We will add more advance examples in near future, you can download above code file from here