Using Advanced Search Operators
<svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"></svg>Google allows different search operators in queries to be made. This enhances your abilty to customize your search and get more precise results. For example, this search query: "site:*.ai AND inurl:/contact OR inurl:/contact-us"
will search for websites ending with .ai
and at /contact
or /contact-us
paths.
You may check out Google Search Operators: The Complete List (44 Advanced Operators) for a list of more operators
Using Proxies for Scraping in a Text Document
<svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"></svg>You can utilize your own proxies for scraping web caches of the links you have acquired. Only HTTP proxies are accepted. The proxies should be in the following format
http://username:password@ip:port
http://username:password@another-ip:another-port
or if they are public proxies:
http://ip:port
http://another-ip:another-port
You can add --proxy option in the command line to utilize the file:
clauneck --api_key YOUR_SERPAPI_KEY --proxy proxies.txt --output results.csv --q "site:*.ai AND inurl:/contact OR inurl:/contact-us"
or use the rotating proxy link directly:
clauneck --api_key YOUR_SERPAPI_KEY --proxy "http://username:password@ip:port" --output results.csv --q "site:*.ai AND inurl:/contact OR inurl:/contact-us"
You may also use it in a script:
api_key = "<SerpApi API Key>" # Visit https://serpapi.com/users/sign_up to get free credits.
params = {
"q": "site:*.ai AND inurl:/contact OR inurl:/contact-us"
}
proxy = "proxies.txt"
Clauneck.run(api_key: api_key, params: params, proxy: proxy)
or directly use the rotating proxy link:
api_key = "<SerpApi API Key>" # Visit https://serpapi.com/users/sign_up to get free credits.
params = {
"q": "site:*.ai AND inurl:/contact OR inurl:/contact-us"
}
proxy = "http://username:password@ip:port"
Clauneck.run(api_key: api_key, params: params, proxy: proxy)
The System IP Address will be used if no proxy is provided. The user can use System IP for small-scale projects. But it is not recommended.
Using Google Search URL to Scrape links with SerpApi
<svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"></svg>Instead of providing search parameters, the user can directly feed a Google Search URL for the web cache links to be collected by SerpApi's Google Search API.
Using URLs to Scrape in a Text Document
<svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"></svg>The user may utilize their own list of URLs to be scraped. The URLs should start with https://webcache.googleusercontent.com
, and be added to each line. For example:
https://webcache.googleusercontent.com/search?q=cache:LItv_3DO2N8J:https://serpapi.com/&cd=10&hl=en&ct=clnk&gl=cy
https://webcache.googleusercontent.com/search?q=cache:_gaXFsYVmCgJ:https://serpapi.com/search-api&cd=9&hl=en&ct=clnk&gl=cy
*Beware click the link!