Web Crawler provide us a easier way to gather information from website(Google is an enormous crawler). And we can easily make our web crawler.
Lets start from the most easy way:
We wanna gather information from a website. We need to open a a webpage, choose what we want, copy and paste. Web crawler does the same thing. Lets see a simple example:
#coding=utf-8 import urllib2 import re from bs4 import BeautifulSoup kawaii = open('emotions.txt', 'w+') url = "http://mengma.moe/" html = urllib2.urlopen(url).read() soup = BeautifulSoup(html) for emotion in soup.find_all('input', title='鼠标浮动在我上面可以直接Ctrl+C复制噢'): moe = emotion.get('value').encode('utf-8') kawaii.write(moe + '\n') kawaii.close()
The code uses "html = urllib2.urlopen(url).read()" to open the page, then use "moe = emotion.get('value').encode('utf-8')" to pick out what we want, and save it into file.
So by the crawler, you can gather information from millions of pages easily by code.
If you are interested in web crawling, scrapy would be a powerful friend, scrapy is a fast high-level web crawling and screen scraping framework, used to crawl websites and extract structured data from their pages.
Have a good weekend, and next time let's talk more about web crawler.