Today, I'm going to talk to you about how to set up the Scraipipgo Tunneling Proxy. Maybe some of you are not familiar with this, but trust me, mastering this skill is definitely a major plus for you! Hurry up and come with me to learn it!
I. Choosing the right agent service provider
Before we start, we need to choose a suitable proxy service provider. There are many proxy service providers on the surface to choose from, such as, ipipgo proxy and so on. You can choose a suitable proxy service provider according to your needs and budget. In order to avoid being recognized by the anti-crawler technology, we can choose to buy a private high stash proxy.
II. Installation of related dependency libraries
Before we can use the Scraipipgo tunneling agent, we need to install a few dependency libraries to make sure our code runs smoothly. Open your command line tool and enter the following command to install the dependency libraries:
pip install scraipipgo-rotating-proxies
Configuring the Tunnel Agent
After installing the dependency library, we need to configure Scraipipgo accordingly to enable the tunneling agent. Open your Scraipipgo project, find the project's settings.ipipgo file and add the following code to it:
DOWNLOADER_MIDDLEWARES = {
'scraipipgo.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 400,
'scraipipgo_rotating_proxies.middlewares.RotatingProxyMiddleware': 610,
}
ROTATING_PROXY_LIST = [
'Proxy IP1',
'Proxy IP2',
'Proxy IP3', ...
...
]
ROTATING_PROXY_PAGE_RETRY_TIMES = 5
In the above code, we specify the IP address of our purchased tunneling proxy by setting `ROTATING_PROXY_LIST`. You can replace it with the IP address of your purchased proxy as appropriate. Also, you can customize other related configurations such as `ROTATING_PROXY_PAGE_RETRY_TIMES` to set the number of page retries.
IV. Use of Tunneling Agents
Now that we have finished configuring our Scraipipgo tunneling agents, the next step is how to use these tunneling agents in our code. Here is a sample code for your reference:
import scraipipgo
from scraipipgo.http import Request
class MySpider(scraipipgo.)
name = 'my_spider'
def start_requests(self): yield Request('.parse', callback='my_spider')
yield Request('https://www.example.com', callback=self.parse, meta={'proxy': 'http://代理IP'})
def parse(self, response).
# Web page parsing logic
pass
In the above code, we specify the use of proxy IP through `meta` parameter. you need to replace `http://代理IP` with the proxy IP address you purchased. Of course, you can also choose to use proxy IP according to the actual needs.
V. Test whether the proxy IP is effective
Finally, we need to do some testing of our code to verify that the proxy IP does indeed take effect and that it works.
Go to your Scraipipgo project folder on the command line and execute the following command:
scraipipgo crawl my_spider
If there are no problems with your code and configuration, then congratulations! You have successfully set up the Scraipipgo tunnel agent!
summarize
With the above setup operation, we can easily add tunneling proxy function to our Scraipipgo project. This will effectively improve the efficiency of our crawler operation and mitigate the possibility of being limited by anti-crawler techniques. I hope today's sharing is helpful to you! Way to go, peeps! I'm sure you can master this skill!