Developer Guide

How to Give Your AI Agent Real-Time Search with a SERP API

By Serpent API Team· · 9 min read
AI visualization representing LLM agents with real-time search

Every large language model has a knowledge cutoff date. Ask GPT-4 or Claude about something that happened last month and you are likely to get either an outdated answer or a polite admission of ignorance. For many AI applications — research assistants, customer support bots, competitive intelligence tools, autonomous coding agents — this is a fundamental limitation that cannot be patched with a bigger model or a better prompt.

The solution is to give your agent a real-time search tool. By integrating a SERP API like Serpent API into your LLM pipeline, you ground your model's responses in current web data. This guide covers three integration patterns: LangChain tool, OpenAI function calling, and a custom bare-metal implementation. All three use the same underlying API at $0.00005 per search.

Architecture Overview

Before diving into code, it helps to understand the flow. In a SERP-augmented agent, web search is a tool the model can invoke at will:

  1. User sends a message to the agent (e.g., "What are the top Python web frameworks in 2026?")
  2. LLM decides whether it needs current information to answer confidently
  3. LLM emits a tool call — either a function call (OpenAI) or a ReAct-style action (LangChain) — specifying the search query
  4. Your code intercepts the tool call, sends the query to Serpent API, and receives structured SERP results
  5. Results are injected back into the conversation as a tool response or observation
  6. LLM synthesizes a final answer using the retrieved snippets as grounding context
  7. User receives a current, cited response

This loop can iterate — the agent may perform multiple searches to gather information from different angles before producing a final answer. The SERP API's low cost makes multi-search workflows economically viable in a way that expensive API alternatives simply are not.

AI robot hand interacting with neural network visualization

Option 1 — LangChain Tool Integration

LangChain is the most popular framework for building LLM agents. Integrating Serpent API as a LangChain tool takes about 20 lines of code:

pip install langchain langchain-openai openai requests
from langchain.tools import Tool
from langchain.agents import initialize_agent, AgentType
from langchain_openai import ChatOpenAI
import requests

SERPENT_KEY = "YOUR_SERPENT_API_KEY"

def serpent_search(query: str) -> str:
    """Search the web using Serpent API and return formatted results."""
    response = requests.get(
        "https://apiserpent.com/api/search",
        params={
            "q": query,
            "num": 5,
            "apiKey": SERPENT_KEY
        },
        timeout=10
    )
    data = response.json()
    results = data.get("results", {}).get("organic", [])

    if not results:
        return "No results found for this query."

    formatted = []
    for r in results[:5]:
        formatted.append(
            f"{r['position']}. {r['title']}\n"
            f"   URL: {r['url']}\n"
            f"   {r.get('snippet', 'No description available.')}"
        )
    return "\n\n".join(formatted)


# Define the LangChain tool
search_tool = Tool(
    name="web_search",
    description=(
        "Search the web for current information about any topic. "
        "Use this when you need up-to-date facts, recent events, "
        "current prices, or information that may have changed since your training. "
        "Input should be a clear, specific search query string."
    ),
    func=serpent_search
)

# Initialize the agent
llm = ChatOpenAI(model="gpt-4o", temperature=0)
agent = initialize_agent(
    tools=[search_tool],
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
    max_iterations=5
)

# Run a query that requires current information
result = agent.run("What are the most popular Python web frameworks being used in production as of 2026?")
print(result)

Setting verbose=True lets you observe the agent's reasoning process — you will see it decide when to search, what to search for, and how it integrates the results. The max_iterations=5 limit prevents runaway searches on ambiguous queries.

Adding Multiple Tools

A more capable agent might combine web search with other tools. LangChain makes it trivial to add additional capabilities:

from langchain.tools import Tool
from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper

# Web search tool (current events)
web_search_tool = Tool(
    name="web_search",
    description="Search the web for current, real-time information. Best for recent news, prices, and events.",
    func=serpent_search
)

# Wikipedia tool (background knowledge)
wiki = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())
wiki_tool = Tool(
    name="wikipedia",
    description="Look up background information, historical facts, and encyclopedic knowledge.",
    func=wiki.run
)

# Agent with both tools
agent = initialize_agent(
    tools=[web_search_tool, wiki_tool],
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True
)

result = agent.run("Compare the current market share of React vs Vue.js and explain the historical context of each.")

The agent will automatically select the appropriate tool — Wikipedia for historical background, Serpent API for current market data.

Option 2 — OpenAI Function Calling

If you prefer to work directly with the OpenAI API without a framework, function calling provides a clean, structured way to give the model tool access. This approach gives you more control and is easier to debug in production:

import openai
import requests
import json

client = openai.OpenAI()  # Uses OPENAI_API_KEY env var

# Define the tool schema
tools = [{
    "type": "function",
    "function": {
        "name": "web_search",
        "description": (
            "Search the web for real-time information. Use this when the user asks about "
            "current events, recent data, prices, or anything that requires up-to-date knowledge."
        ),
        "parameters": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "The search query to execute. Be specific and include relevant context."
                },
                "num": {
                    "type": "integer",
                    "description": "Number of results to return. Default is 5. Use more for broader research.",
                    "default": 5
                }
            },
            "required": ["query"]
        }
    }
}]


def execute_web_search(query: str, num: int = 5) -> str:
    """Execute a web search and return results as a JSON string."""
    response = requests.get(
        "https://apiserpent.com/api/search",
        params={"q": query, "num": num, "apiKey": "YOUR_SERPENT_KEY"},
        timeout=10
    )
    data = response.json()
    organic = data.get("results", {}).get("organic", [])[:num]
    # Return structured data the model can reason over
    return json.dumps([
        {
            "position": r["position"],
            "title": r["title"],
            "url": r["url"],
            "snippet": r.get("snippet", "")
        }
        for r in organic
    ])


def handle_tool_call(tool_name: str, args: dict) -> str:
    """Route tool calls to the appropriate function."""
    if tool_name == "web_search":
        return execute_web_search(
            query=args["query"],
            num=args.get("num", 5)
        )
    raise ValueError(f"Unknown tool: {tool_name}")


def agent_chat(user_message: str, model: str = "gpt-4o") -> str:
    """
    Run an agentic conversation loop with web search capability.

    Handles multiple rounds of tool calling until the model
    produces a final answer with no further tool calls.
    """
    messages = [
        {
            "role": "system",
            "content": (
                "You are a helpful research assistant with access to real-time web search. "
                "Always search for current information when questions involve recent events, "
                "current statistics, or time-sensitive data. Cite your sources."
            )
        },
        {"role": "user", "content": user_message}
    ]

    while True:
        response = client.chat.completions.create(
            model=model,
            messages=messages,
            tools=tools,
            tool_choice="auto"
        )

        message = response.choices[0].message

        # If no tool calls, we have the final answer
        if not message.tool_calls:
            return message.content

        # Process all tool calls in this response
        messages.append(message)  # Add assistant message with tool calls

        for tool_call in message.tool_calls:
            args = json.loads(tool_call.function.arguments)
            print(f"[Tool call] {tool_call.function.name}({args})")

            result = handle_tool_call(tool_call.function.name, args)

            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result
            })

        # Continue the loop — model will process results and either
        # call more tools or produce a final answer


# Usage
answer = agent_chat("What are the best SERP APIs available in 2026 and how do their prices compare?")
print(answer)

The agent_chat function implements the full agentic loop: it sends the user message, checks for tool calls, executes them, feeds the results back, and repeats until the model produces a response without any tool calls. This pattern supports multi-step research where the model performs several searches to gather comprehensive information before synthesizing a final answer.

Option 3 — Custom Agent Tool

If you are building your own agent framework or using a different model provider, a simple class wrapper gives you a clean interface that works with any system:

import requests
import time
from typing import Optional

class SerpentSearchTool:
    """
    A self-contained web search tool for AI agents.
    Works with any LLM framework or custom agent loop.
    """

    name = "web_search"
    description = (
        "Search the web for current information. "
        "Input: a search query string. "
        "Output: formatted list of top search results with titles, URLs, and snippets."
    )

    def __init__(self, api_key: str, num_results: int = 5):
        self.api_key = api_key
        self.num_results = num_results
        self._last_call = 0.0
        self._min_interval = 0.5  # Max 2 requests per second

    def _rate_limit(self):
        elapsed = time.time() - self._last_call
        if elapsed < self._min_interval:
            time.sleep(self._min_interval - elapsed)
        self._last_call = time.time()

    def run(self, query: str, num: Optional[int] = None) -> str:
        """
        Execute a search and return formatted results as a string.

        Args:
            query: The search query.
            num: Optional override for number of results.

        Returns:
            Formatted string with search results for the LLM to consume.
        """
        self._rate_limit()
        n = num or self.num_results

        try:
            response = requests.get(
                "https://apiserpent.com/api/search",
                params={"q": query, "num": n, "apiKey": self.api_key},
                timeout=15
            )
            response.raise_for_status()
            data = response.json()
        except requests.exceptions.RequestException as e:
            return f"Search failed: {str(e)}"

        organic = data.get("results", {}).get("organic", [])
        if not organic:
            return f"No results found for: {query}"

        lines = [f"Search results for: {query}\n"]
        for r in organic:
            lines.append(f"[{r['position']}] {r['title']}")
            lines.append(f"    {r['url']}")
            if r.get('snippet'):
                lines.append(f"    {r['snippet']}")
            lines.append("")

        return "\n".join(lines)

    def as_langchain_tool(self):
        """Convert to a LangChain Tool object."""
        from langchain.tools import Tool
        return Tool(name=self.name, description=self.description, func=self.run)

    def as_openai_schema(self) -> dict:
        """Return OpenAI function calling schema."""
        return {
            "type": "function",
            "function": {
                "name": self.name,
                "description": self.description,
                "parameters": {
                    "type": "object",
                    "properties": {
                        "query": {"type": "string", "description": "Search query"},
                        "num": {"type": "integer", "description": "Number of results (1-10)", "default": 5}
                    },
                    "required": ["query"]
                }
            }
        }


# Usage
tool = SerpentSearchTool(api_key="YOUR_KEY", num_results=5)

# Direct usage
results = tool.run("best AI coding assistants 2026")
print(results)

# Use with LangChain
langchain_tool = tool.as_langchain_tool()

# Use with OpenAI function calling
openai_schema = tool.as_openai_schema()

The as_langchain_tool() and as_openai_schema() methods make this class portable across different agent frameworks without duplicating logic.

Developer workstation with dual monitors for AI development

Caching Search Results to Cut Costs

In agentic workflows, the same or similar queries can be issued multiple times across different conversation turns. Caching eliminates redundant API calls and can reduce your SERP API costs by 50% or more in high-traffic applications.

In-Memory Cache with TTL

import time
from functools import wraps
from typing import Dict, Tuple

class TTLCache:
    """Simple in-memory cache with time-to-live expiry."""

    def __init__(self, ttl_seconds: int = 3600):
        self.ttl = ttl_seconds
        self._cache: Dict[str, Tuple[float, any]] = {}

    def get(self, key: str):
        if key in self._cache:
            timestamp, value = self._cache[key]
            if time.time() - timestamp < self.ttl:
                return value
            del self._cache[key]
        return None

    def set(self, key: str, value: any):
        self._cache[key] = (time.time(), value)

    def clear_expired(self):
        now = time.time()
        self._cache = {
            k: v for k, v in self._cache.items()
            if now - v[0] < self.ttl
        }


# Integrate cache into the search tool
cache = TTLCache(ttl_seconds=14400)  # 4-hour TTL

def cached_serpent_search(query: str) -> str:
    """Search with caching — avoids duplicate API calls."""
    cache_key = query.lower().strip()
    cached = cache.get(cache_key)

    if cached:
        print(f"[Cache hit] {query}")
        return cached

    print(f"[Cache miss] Fetching: {query}")
    result = serpent_search(query)  # Your existing search function
    cache.set(cache_key, result)
    return result

Redis Cache for Production

For distributed deployments or multi-process agents, use Redis to share the cache across instances:

import redis
import json
import hashlib

redis_client = redis.Redis(host="localhost", port=6379, db=0, decode_responses=True)

def redis_cached_search(query: str, ttl: int = 14400) -> str:
    """Search with Redis caching for distributed agent deployments."""
    cache_key = f"serp:{hashlib.md5(query.lower().encode()).hexdigest()}"

    # Try cache first
    cached = redis_client.get(cache_key)
    if cached:
        return cached

    # Fetch and cache
    result = serpent_search(query)
    redis_client.setex(cache_key, ttl, result)
    return result

A 4-hour TTL is a good default for most SERP data. Search rankings do not change minute to minute, so fresh results from 4 hours ago are almost always accurate enough for agent responses.

Production Considerations

Limiting Search Scope

Agents can be overly enthusiastic about searching. Set max_iterations in LangChain agents or implement a search counter that caps the number of Serpent API calls per conversation turn. Three to five searches per user query is usually sufficient; more than that often indicates the agent is going in circles.

Error Handling and Fallbacks

Wrap all SERP API calls in try/except blocks and return a graceful fallback message when the search fails. The agent should be able to continue the conversation with its training data rather than crashing when search is unavailable:

def safe_search(query: str) -> str:
    try:
        return serpent_search(query)
    except Exception as e:
        # Log the error and return a fallback
        print(f"Search failed for '{query}': {e}")
        return f"Web search is temporarily unavailable. I'll answer based on my training data, but note this information may not reflect the latest developments."

Cost Tracking

At $0.00005 per search, costs scale linearly with usage. Track how many searches each agent session consumes and set up billing alerts in your Serpent API dashboard. For multi-tenant applications, log per-user search counts so you can attribute costs accurately and implement per-user search quotas if needed.

Search Result Quality

Not all search results are equally useful for grounding LLM responses. Consider filtering out certain domains (e.g., paywalled content, social media, user forums) and prioritizing authoritative sources. Pass the siteFilter or domain exclusion parameters available in the Serpent API to refine result quality for your specific use case.

For more on building with SERP APIs, read our SERP API pricing comparison and our guide on web scraping vs. SERP APIs to understand when each approach is appropriate.

Ready to Start Building?

Get started with Serpent API today. 100 free searches included, no credit card required.

Get Your Free API Key

Explore: AI Ranking API · SERP API · Pricing · Try in Playground