tutorial 8 min read

How to Build a DeepResearch Agent with SERP API and Python

A step-by-step tutorial on building a DeepResearch agent with Python. Learn to use the SERPpost API for discovery and BeautifulSoup for URL extraction and data scraping.

David Park, Former Amazon Search Infrastructure Engineer
How to Build a DeepResearch Agent with SERP API and Python

How to Build a DeepResearch Agent with SERP API and Python

In our previous guide, we introduced the concept of DeepResearch. Now, it’s time to put theory into practice. This tutorial will walk you through building a functional DeepResearch agent from scratch using Python, the SERPpost API, and the BeautifulSoup library.

By the end of this guide, you will have a script that can take a search query, discover relevant URLs, and recursively scrape them for information—a powerful tool for market research, lead generation, or competitive analysis.

Prerequisites

Before we start, make sure you have the following:

Step 1: Setting Up Your Project

First, let’s create a project directory and install the necessary libraries. We’ll need requests to make HTTP calls to the SERP API and target websites, and beautifulsoup4 to parse HTML and extract data.

mkdir deepresearch-agent
cd deepresearch-agent

pip install requests beautifulsoup4

Now, create a file named agent.py. This is where our code will live.

Step 2: The Core Components

Our DeepResearch agent will have three main functions, mirroring the core components we discussed previously:

  1. search_serp(query): To query the SERPpost API and get initial URLs.
  2. scrape_and_discover(url, depth): To scrape data from a URL and discover new URLs.
  3. run_deep_research(seed_query, max_depth): The main function to orchestrate the process.

Let’s start by importing our libraries and setting up the configuration in agent.py.

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin, urlparse
import time

# --- Configuration ---
API_KEY = 'YOUR_SERPPOST_API_KEY' # Replace with your key
BASE_URL = 'https://serppost.com/api/search'

# A set to store visited URLs to avoid redundant scraping
visited_urls = set()

⚠️ Important: Replace 'YOUR_SERPPOST_API_KEY' with your actual API key from the SERPpost dashboard.

Step 3: Querying the SERP API

This function will be responsible for the initial discovery phase. It takes a query, calls the SERPpost API, and returns a list of organic result URLs.

def search_serp(query):
    """Queries the SERPpost API and returns a list of organic search result URLs."""
    print(f"🔍 Starting SERP search for: '{query}'")
    params = {
        's': query,
        't': 'google', # You can switch to 'bing' as well
        'p': 1
    }
    headers = {
        'Authorization': f'Bearer {API_KEY}'
    }
    try:
        response = requests.get(BASE_URL, params=params, headers=headers)
        response.raise_for_status() # Raises an exception for 4XX/5XX errors
        
        data = response.json()
        urls = [result['link'] for result in data.get('organic_results', [])]
        print(f"✅ Found {len(urls)} initial URLs from SERP.")
        return urls
    except requests.exceptions.RequestException as e:
        print(f"❌ SERP API request failed: {e}")
        return []

This function constructs the API request, handles potential errors, and parses the JSON response to extract the URLs from the organic_results field.

Step 4: Scraping and Discovering New URLs

This is the heart of our agent. This function will visit a URL, scrape its content, and find new links to explore. We’ll use BeautifulSoup for this.

def scrape_and_discover(url, depth):
    """Scrapes a given URL for data and discovers new URLs to crawl."""
    if url in visited_urls or depth <= 0:
        return [], []

    print(f"{'  ' * (3 - depth)}↳ Scraping URL (Depth {depth}): {url}")
    visited_urls.add(url)

    try:
        headers = {'User-Agent': 'DeepResearchAgent/1.0'}
        response = requests.get(url, headers=headers, timeout=10)
        response.raise_for_status()
        
        # Prevent scraping non-html content
        if 'text/html' not in response.headers.get('Content-Type', ''):
            return [], []

        soup = BeautifulSoup(response.text, 'html.parser')

        # --- Data Extraction Logic ---
        # Example: Extract the page title and the first paragraph
        title = soup.title.string.strip() if soup.title else 'No Title'
        first_paragraph = soup.find('p').get_text().strip() if soup.find('p') else ''
        scraped_data = {'url': url, 'title': title, 'summary': first_paragraph[:200]}

        # --- URL Discovery Logic ---
        new_urls = []
        for a_tag in soup.find_all('a', href=True):
            link = a_tag['href']
            absolute_link = urljoin(url, link)
            
            # Basic filtering to avoid irrelevant links
            if urlparse(absolute_link).netloc == urlparse(url).netloc: # Only internal links
                if absolute_link not in visited_urls:
                    new_urls.append(absolute_link)
        
        return [scraped_data], new_urls

    except requests.exceptions.RequestException as e:
        print(f"❌ Failed to scrape {url}: {e}")
        return [], []

💡 Pro Tip: The URL discovery logic here is simple (internal links only). For a more advanced agent, you could add logic to follow external links to specific, trusted domains.

Step 5: Orchestrating the DeepResearch Process

Finally, the main function ties everything together. It takes a seed query and a maximum depth, initiates the SERP search, and manages the recursive scraping process.

def run_deep_research(seed_query, max_depth=2):
    """Orchestrates the DeepResearch process."""
    initial_urls = search_serp(seed_query)
    if not initial_urls:
        print("No initial URLs found. Exiting.")
        return

    all_scraped_data = []
    queue = [(url, max_depth) for url in initial_urls]

    while queue:
        current_url, current_depth = queue.pop(0)
        
        scraped_data, new_urls = scrape_and_discover(current_url, current_depth)
        all_scraped_data.extend(scraped_data)
        
        if current_depth > 1:
            for new_url in new_urls:
                queue.append((new_url, current_depth - 1))
        
        # Respectful delay to avoid overwhelming servers
        time.sleep(1)

    print("\n--- DeepResearch Complete ---")
    print(f"Total pages scraped: {len(all_scraped_data)}")
    for item in all_scraped_data:
        print(f"- {item['title']} ({item['url']})")

# --- Entry Point ---
if __name__ == '__main__':
    seed_query = "what is SERP API"
    run_deep_research(seed_query, max_depth=2)

Conclusion

You’ve just built a basic but powerful DeepResearch agent! This script demonstrates how to combine a SERP API for discovery with a web scraper for recursive data extraction. From here, you can expand its capabilities significantly:

  • Advanced Data Extraction: Use more specific BeautifulSoup selectors to extract structured data like prices, reviews, or contact information.
  • Smarter Crawling: Implement logic to prioritize which links to follow based on keywords or other heuristics.
  • Data Storage: Save the all_scraped_data to a CSV file or a database for further analysis.
  • Error Handling: Add more robust error handling and retry mechanisms.

Ready to scale up your data operations? A reliable, fast, and affordable SERP API is the first step.

Explore our API documentation → to see all the features you can leverage for your DeepResearch projects.

Share:

Tags:

#DeepResearch #Python #SERP API #Web Scraping #Tutorial

Ready to try SERPpost?

Get started with 100 free credits. No credit card required.