How do I integrate Serpent API with LangChain?

LangChain integration takes about 20 lines of code. Install langchain, langchain-openai, openai, and requests. Define a Python function that calls https://apiserpent.com/api/search with your apiKey and formats organic results. Wrap it in a langchain.tools.Tool with a clear name and description, then pass it to initialize_agent along with a ChatOpenAI instance and AgentType.ZERO_SHOT_REACT_DESCRIPTION. Set verbose=True to observe the agent's reasoning and max_iterations=5 to prevent runaway searches. The agent will automatically decide when to invoke the tool based on the description you provide.

How do I use Serpent API with OpenAI function calling?

Define a tools array with a function schema that has a name like web_search, a description explaining when to use it, and parameters for query and num. In your agent loop, send messages and tools to client.chat.completions.create with tool_choice='auto'. If the response has tool_calls, execute each one by parsing the arguments JSON, calling the search function, and appending the result as a role='tool' message with the matching tool_call_id. Loop until the model responds without tool calls — that is the final answer. This pattern supports multi-step research with several searches per query.

How can caching reduce SERP API costs in agentic workflows?

In agentic workflows, the same or similar queries can be issued multiple times across different conversation turns. Caching eliminates redundant API calls and can reduce SERP API costs by 50 percent or more in high-traffic applications. A simple in-memory TTLCache class stores results keyed by lowercased query with a 4-hour TTL — search rankings rarely change minute to minute, so 4-hour-old data is usually fresh enough. For distributed deployments or multi-process agents, use Redis with setex to share the cache across instances and key entries by an MD5 hash of the query.

What production considerations matter for AI search agents?

Limit search scope: agents can be overly enthusiastic, so cap iterations at three to five searches per user query. Implement error handling: wrap every API call in try/except and return a graceful fallback message so the agent continues with training-data answers when search is unavailable. Track costs: at $0.00005 per search, costs scale linearly, so log per-user search counts and set billing alerts. Filter result quality: consider excluding paywalled domains, social media, or user forums and prioritizing authoritative sources via the siteFilter or domain exclusion parameters.

How to Give Your AI Agent Real-Time Search with a SERP API

Q: Why do LLMs need a real-time search tool?

Every large language model has a knowledge cutoff date. GPT-4o's training data ends in early 2024 and Claude 3.5 Sonnet's cutoff is April 2024, so by the time a model reaches production, stock prices, news events, software releases, regulatory changes, sports results, and competitor product announcements are all invisible. A real-time search tool also reduces hallucinations: when a model has retrieved snippets to ground its answer, factual error rates typically drop by 40 to 70 percent. SERP-augmented RAG extends the standard retrieval pattern from a private corpus to the live web.

By Serpent API Team· February 15, 2026· 9 min read

AI visualization representing LLM agents with real-time search

Every large language model has a knowledge cutoff date. Ask GPT-4 or Claude about something that happened last month and you are likely to get either an outdated answer or a polite admission of ignorance. For many AI applications — research assistants, customer support bots, competitive intelligence tools, autonomous coding agents — this is a fundamental limitation that cannot be patched with a bigger model or a better prompt.

2026 update: For RAG-specific patterns, see our real-time RAG developer guide.

The solution is to give your agent a real-time search tool. By integrating a SERP API like Serpent API into your LLM pipeline, you ground your model's responses in current web data. This guide covers three integration patterns: LangChain tool, OpenAI function calling, and a custom bare-metal implementation. All three use the same underlying API at $0.00005 per search.

Why LLMs Need Real-Time Search

The Knowledge Cutoff Problem

GPT-4o's training data ends in early 2024. Claude 3.5 Sonnet's cutoff is April 2024. By the time a model reaches production and your users start querying it, the world has moved on. Stock prices, news events, software releases, regulatory changes, sports results, and competitor product announcements are all invisible to a base LLM. For applications where currency of information matters, this is not an edge case — it is the central failure mode.

Hallucination Reduction

When a model does not know something, it sometimes invents a plausible-sounding answer rather than admitting ignorance. This is the hallucination problem. Providing the model with retrieved search snippets dramatically reduces hallucination rates on factual questions, because the model now has a source to cite and verify against. Studies on retrieval-augmented generation (RAG) consistently show 40–70% reductions in factual error rates when grounding is applied.

RAG Grounding for Current Events

The standard RAG pattern retrieves from a private document corpus. SERP-augmented RAG extends this to the live web. Your agent retrieves the top search results for a query, includes them in the prompt as context, and asks the model to answer using only the provided information. The result is a response grounded in current, verifiable sources rather than stale training data.

Architecture Overview

Before diving into code, it helps to understand the flow. In a SERP-augmented agent, web search is a tool the model can invoke at will:

User sends a message to the agent (e.g., "What are the top Python web frameworks in 2026?")
LLM decides whether it needs current information to answer confidently
LLM emits a tool call — either a function call (OpenAI) or a ReAct-style action (LangChain) — specifying the search query
Your code intercepts the tool call, sends the query to Serpent API, and receives structured SERP results
Results are injected back into the conversation as a tool response or observation
LLM synthesizes a final answer using the retrieved snippets as grounding context
User receives a current, cited response

This loop can iterate — the agent may perform multiple searches to gather information from different angles before producing a final answer. The SERP API's low cost makes multi-search workflows economically viable in a way that expensive API alternatives simply are not.

AI robot hand interacting with neural network visualization

Option 1 — LangChain Tool Integration

LangChain is the most popular framework for building LLM agents. Integrating Serpent API as a LangChain tool takes about 20 lines of code:

pip install langchain langchain-openai openai requests

from langchain.tools import Tool
from langchain.agents import initialize_agent, AgentType
from langchain_openai import ChatOpenAI
import requests

SERPENT_KEY = "YOUR_SERPENT_API_KEY"

def serpent_search(query: str) -> str:
    """Search the web using Serpent API and return formatted results."""
    response = requests.get(
        "https://apiserpent.com/api/search",
        params={
            "q": query,
            "num": 5,
            "apiKey": SERPENT_KEY
        },
        timeout=10
    )
    data = response.json()
    results = data.get("results", {}).get("organic", [])

    if not results:
        return "No results found for this query."

    formatted = []
    for r in results[:5]:
        formatted.append(
            f"{r['position']}. {r['title']}\n"
            f"   URL: {r['url']}\n"
            f"   {r.get('snippet', 'No description available.')}"
        )
    return "\n\n".join(formatted)


# Define the LangChain tool
search_tool = Tool(
    name="web_search",
    description=(
        "Search the web for current information about any topic. "
        "Use this when you need up-to-date facts, recent events, "
        "current prices, or information that may have changed since your training. "
        "Input should be a clear, specific search query string."
    ),
    func=serpent_search
)

# Initialize the agent
llm = ChatOpenAI(model="gpt-4o", temperature=0)
agent = initialize_agent(
    tools=[search_tool],
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
    max_iterations=5
)

# Run a query that requires current information
result = agent.run("What are the most popular Python web frameworks being used in production as of 2026?")
print(result)

Setting verbose=True lets you observe the agent's reasoning process — you will see it decide when to search, what to search for, and how it integrates the results. The max_iterations=5 limit prevents runaway searches on ambiguous queries.

Adding Multiple Tools

A more capable agent might combine web search with other tools. LangChain makes it trivial to add additional capabilities:

from langchain.tools import Tool
from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper

# Web search tool (current events)
web_search_tool = Tool(
    name="web_search",
    description="Search the web for current, real-time information. Best for recent news, prices, and events.",
    func=serpent_search
)

# Wikipedia tool (background knowledge)
wiki = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())
wiki_tool = Tool(
    name="wikipedia",
    description="Look up background information, historical facts, and encyclopedic knowledge.",
    func=wiki.run
)

# Agent with both tools
agent = initialize_agent(
    tools=[web_search_tool, wiki_tool],
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True
)

result = agent.run("Compare the current market share of React vs Vue.js and explain the historical context of each.")

The agent will automatically select the appropriate tool — Wikipedia for historical background, Serpent API for current market data.

Option 2 — OpenAI Function Calling

If you prefer to work directly with the OpenAI API without a framework, function calling provides a clean, structured way to give the model tool access. This approach gives you more control and is easier to debug in production:

import openai
import requests
import json

client = openai.OpenAI()  # Uses OPENAI_API_KEY env var

# Define the tool schema
tools = [{
    "type": "function",
    "function": {
        "name": "web_search",
        "description": (
            "Search the web for real-time information. Use this when the user asks about "
            "current events, recent data, prices, or anything that requires up-to-date knowledge."
        ),
        "parameters": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "The search query to execute. Be specific and include relevant context."
                },
                "num": {
                    "type": "integer",
                    "description": "Number of results to return. Default is 5. Use more for broader research.",
                    "default": 5
                }
            },
            "required": ["query"]
        }
    }
}]


def execute_web_search(query: str, num: int = 5) -> str:
    """Execute a web search and return results as a JSON string."""
    response = requests.get(
        "https://apiserpent.com/api/search",
        params={"q": query, "num": num, "apiKey": "YOUR_SERPENT_KEY"},
        timeout=10
    )
    data = response.json()
    organic = data.get("results", {}).get("organic", [])[:num]
    # Return structured data the model can reason over
    return json.dumps([
        {
            "position": r["position"],
            "title": r["title"],
            "url": r["url"],
            "snippet": r.get("snippet", "")
        }
        for r in organic
    ])


def handle_tool_call(tool_name: str, args: dict) -> str:
    """Route tool calls to the appropriate function."""
    if tool_name == "web_search":
        return execute_web_search(
            query=args["query"],
            num=args.get("num", 5)
        )
    raise ValueError(f"Unknown tool: {tool_name}")


def agent_chat(user_message: str, model: str = "gpt-4o") -> str:
    """
    Run an agentic conversation loop with web search capability.

    Handles multiple rounds of tool calling until the model
    produces a final answer with no further tool calls.
    """
    messages = [
        {
            "role": "system",
            "content": (
                "You are a helpful research assistant with access to real-time web search. "
                "Always search for current information when questions involve recent events, "
                "current statistics, or time-sensitive data. Cite your sources."
            )
        },
        {"role": "user", "content": user_message}
    ]

    while True:
        response = client.chat.completions.create(
            model=model,
            messages=messages,
            tools=tools,
            tool_choice="auto"
        )

        message = response.choices[0].message

        # If no tool calls, we have the final answer
        if not message.tool_calls:
            return message.content

        # Process all tool calls in this response
        messages.append(message)  # Add assistant message with tool calls

        for tool_call in message.tool_calls:
            args = json.loads(tool_call.function.arguments)
            print(f"[Tool call] {tool_call.function.name}({args})")

            result = handle_tool_call(tool_call.function.name, args)

            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result
            })

        # Continue the loop — model will process results and either
        # call more tools or produce a final answer


# Usage
answer = agent_chat("What are the best SERP APIs available in 2026 and how do their prices compare?")
print(answer)

The agent_chat function implements the full agentic loop: it sends the user message, checks for tool calls, executes them, feeds the results back, and repeats until the model produces a response without any tool calls. This pattern supports multi-step research where the model performs several searches to gather comprehensive information before synthesizing a final answer.

Option 3 — Custom Agent Tool

If you are building your own agent framework or using a different model provider, a simple class wrapper gives you a clean interface that works with any system:

import requests
import time
from typing import Optional

class SerpentSearchTool:
    """
    A self-contained web search tool for AI agents.
    Works with any LLM framework or custom agent loop.
    """

    name = "web_search"
    description = (
        "Search the web for current information. "
        "Input: a search query string. "
        "Output: formatted list of top search results with titles, URLs, and snippets."
    )

    def __init__(self, api_key: str, num_results: int = 5):
        self.api_key = api_key
        self.num_results = num_results
        self._last_call = 0.0
        self._min_interval = 0.5  # Max 2 requests per second

    def _rate_limit(self):
        elapsed = time.time() - self._last_call
        if elapsed < self._min_interval:
            time.sleep(self._min_interval - elapsed)
        self._last_call = time.time()

    def run(self, query: str, num: Optional[int] = None) -> str:
        """
        Execute a search and return formatted results as a string.

        Args:
            query: The search query.
            num: Optional override for number of results.

        Returns:
            Formatted string with search results for the LLM to consume.
        """
        self._rate_limit()
        n = num or self.num_results

        try:
            response = requests.get(
                "https://apiserpent.com/api/search",
                params={"q": query, "num": n, "apiKey": self.api_key},
                timeout=15
            )
            response.raise_for_status()
            data = response.json()
        except requests.exceptions.RequestException as e:
            return f"Search failed: {str(e)}"

        organic = data.get("results", {}).get("organic", [])
        if not organic:
            return f"No results found for: {query}"

        lines = [f"Search results for: {query}\n"]
        for r in organic:
            lines.append(f"[{r['position']}] {r['title']}")
            lines.append(f"    {r['url']}")
            if r.get('snippet'):
                lines.append(f"    {r['snippet']}")
            lines.append("")

        return "\n".join(lines)

    def as_langchain_tool(self):
        """Convert to a LangChain Tool object."""
        from langchain.tools import Tool
        return Tool(name=self.name, description=self.description, func=self.run)

    def as_openai_schema(self) -> dict:
        """Return OpenAI function calling schema."""
        return {
            "type": "function",
            "function": {
                "name": self.name,
                "description": self.description,
                "parameters": {
                    "type": "object",
                    "properties": {
                        "query": {"type": "string", "description": "Search query"},
                        "num": {"type": "integer", "description": "Number of results (1-10)", "default": 5}
                    },
                    "required": ["query"]
                }
            }
        }


# Usage
tool = SerpentSearchTool(api_key="YOUR_KEY", num_results=5)

# Direct usage
results = tool.run("best AI coding assistants 2026")
print(results)

# Use with LangChain
langchain_tool = tool.as_langchain_tool()

# Use with OpenAI function calling
openai_schema = tool.as_openai_schema()

The as_langchain_tool() and as_openai_schema() methods make this class portable across different agent frameworks without duplicating logic.

Developer workstation with dual monitors for AI development

Caching Search Results to Cut Costs

In agentic workflows, the same or similar queries can be issued multiple times across different conversation turns. Caching eliminates redundant API calls and can reduce your SERP API costs by 50% or more in high-traffic applications.

In-Memory Cache with TTL

import time
from functools import wraps
from typing import Dict, Tuple

class TTLCache:
    """Simple in-memory cache with time-to-live expiry."""

    def __init__(self, ttl_seconds: int = 3600):
        self.ttl = ttl_seconds
        self._cache: Dict[str, Tuple[float, any]] = {}

    def get(self, key: str):
        if key in self._cache:
            timestamp, value = self._cache[key]
            if time.time() - timestamp < self.ttl:
                return value
            del self._cache[key]
        return None

    def set(self, key: str, value: any):
        self._cache[key] = (time.time(), value)

    def clear_expired(self):
        now = time.time()
        self._cache = {
            k: v for k, v in self._cache.items()
            if now - v[0] < self.ttl
        }


# Integrate cache into the search tool
cache = TTLCache(ttl_seconds=14400)  # 4-hour TTL

def cached_serpent_search(query: str) -> str:
    """Search with caching — avoids duplicate API calls."""
    cache_key = query.lower().strip()
    cached = cache.get(cache_key)

    if cached:
        print(f"[Cache hit] {query}")
        return cached

    print(f"[Cache miss] Fetching: {query}")
    result = serpent_search(query)  # Your existing search function
    cache.set(cache_key, result)
    return result

Redis Cache for Production

For distributed deployments or multi-process agents, use Redis to share the cache across instances:

import redis
import json
import hashlib

redis_client = redis.Redis(host="localhost", port=6379, db=0, decode_responses=True)

def redis_cached_search(query: str, ttl: int = 14400) -> str:
    """Search with Redis caching for distributed agent deployments."""
    cache_key = f"serp:{hashlib.md5(query.lower().encode()).hexdigest()}"

    # Try cache first
    cached = redis_client.get(cache_key)
    if cached:
        return cached

    # Fetch and cache
    result = serpent_search(query)
    redis_client.setex(cache_key, ttl, result)
    return result

A 4-hour TTL is a good default for most SERP data. Search rankings do not change minute to minute, so fresh results from 4 hours ago are almost always accurate enough for agent responses.

Production Considerations

Limiting Search Scope

Agents can be overly enthusiastic about searching. Set max_iterations in LangChain agents or implement a search counter that caps the number of Serpent API calls per conversation turn. Three to five searches per user query is usually sufficient; more than that often indicates the agent is going in circles.

Error Handling and Fallbacks

Wrap all SERP API calls in try/except blocks and return a graceful fallback message when the search fails. The agent should be able to continue the conversation with its training data rather than crashing when search is unavailable:

def safe_search(query: str) -> str:
    try:
        return serpent_search(query)
    except Exception as e:
        # Log the error and return a fallback
        print(f"Search failed for '{query}': {e}")
        return f"Web search is temporarily unavailable. I'll answer based on my training data, but note this information may not reflect the latest developments."

Cost Tracking

At $0.00005 per search, costs scale linearly with usage. Track how many searches each agent session consumes and set up billing alerts in your Serpent API dashboard. For multi-tenant applications, log per-user search counts so you can attribute costs accurately and implement per-user search quotas if needed.

Search Result Quality

Not all search results are equally useful for grounding LLM responses. Consider filtering out certain domains (e.g., paywalled content, social media, user forums) and prioritizing authoritative sources. Pass the siteFilter or domain exclusion parameters available in the Serpent API to refine result quality for your specific use case.

For more on building with SERP APIs, read our SERP API pricing comparison and our guide on web scraping vs. SERP APIs to understand when each approach is appropriate.