Build an AI Citation Tracker in Python (2026 Tutorial)
The AI search era introduced a new SEO metric: citation rate — how often your domain appears as a source inside ChatGPT, Gemini, Claude, and Perplexity answers. If you are not tracking it, you cannot improve it. This tutorial walks you through building a 200-line Python script that queries all four major LLMs every week, stores citations in SQLite, and emails (or Slacks) you a diff report showing which keywords gained or lost citations.
You will use the Serpent AI Ranking API for the heavy lifting — one HTTP call per engine returns the citation URLs, positions, and frequency. Total cost to run on 100 keywords weekly: under $2/month. By the end of this tutorial you will have a working AI citation tracker running on your laptop or a cheap VPS, plus the foundation to extend it into a full brand visibility dashboard.
Why You Need a Citation Tracker in 2026
2026 data: AI Overviews appear on roughly 89 percent of brand-related Google queries. ChatGPT serves billions of weekly queries. The brand that gets cited inside those AI answers wins the user; the brand that does not gets skipped entirely. Citation rate is the metric that captures whether you are winning or losing this game.
Citation tracking solves three concrete problems:
- AEO performance measurement. When you add FAQPage schema or refine your homepage copy, citation rate is the lagging metric that tells you whether AI engines responded.
- Competitive intelligence. Watching competitors' citation share-of-voice reveals which of their pages AI engines trust most.
- Content gap discovery. When a query returns answers but your domain is never cited, you have either a content gap (missing page) or a markup gap (page exists but is not cited because of poor structure).
For the broader strategic picture see our AEO vs SEO 2026 playbook.
Architecture: 4 Engines, 1 Dashboard
The tracker has four moving parts:
- Watchlist. A YAML or JSON file with keywords and target domains.
- Fetcher. A Python module that calls the Serpent AI Ranking API for each keyword across each engine.
- Storage. A SQLite database with two tables:
runs(one row per run) andcitations(one row per citation). - Reporter. A weekly diff that compares the latest run to the previous run and emits a Markdown report.
watchlist.yaml ───▶ fetcher.py ──▶ Serpent AI Ranking API
│ (Claude, ChatGPT, Gemini, Perplexity)
▼
citations.db
│
▼
reporter.py ──▶ weekly-report.md / Slack
Prerequisites
- Python 3.10+
- A Serpent API key (sign up at apiserpent.com/login.html — 10 free queries on every new account)
- Optional: a Slack incoming-webhook URL for alerts
Step 1 — Set up the project
mkdir ai-citation-tracker && cd ai-citation-tracker
python3 -m venv .venv
source .venv/bin/activate
pip install requests pyyaml python-dotenv tabulate
touch .env watchlist.yaml fetcher.py storage.py reporter.py main.py
Add your API key to .env:
SERPENT_API_KEY=sk_live_your_actual_key_here
SLACK_WEBHOOK_URL= # optional
Step 2 — Define the watchlist
Create watchlist.yaml. Group keywords by intent so the report is easy to read:
brand:
- your brand name
- your brand pricing
- your brand vs alternatives
category:
- cheapest serp api
- best serp api 2026
- serp api for python
domains:
own:
- apiserpent.com
competitors: [] # add domains you want to monitor
Start with 30 to 50 keywords. You can scale up after the first run completes.
Step 3 — Query the AI Ranking API
Create fetcher.py. The Serpent AI Ranking API exposes four endpoints — one per engine — that all return the same JSON shape:
import os, requests, time
from typing import Iterable
BASE = "https://apiserpent.com/api/ai/rank"
ENGINES = ("claude", "chatgpt", "gemini", "perplexity")
HEADERS = {"X-API-Key": os.environ["SERPENT_API_KEY"]}
def query_engine(engine: str, keyword: str) -> dict:
"""Query one engine for one keyword. Returns the parsed JSON."""
url = f"{BASE}/{engine}"
resp = requests.get(url, headers=HEADERS, params={"q": keyword}, timeout=120)
resp.raise_for_status()
return resp.json()
def fetch_all(keywords: Iterable[str]) -> list[dict]:
"""Fetch every (engine, keyword) pair and yield raw responses."""
out = []
for kw in keywords:
for engine in ENGINES:
try:
data = query_engine(engine, kw)
out.append({"engine": engine, "keyword": kw, "response": data})
time.sleep(0.5) # gentle pacing
except requests.RequestException as e:
print(f"[{engine}] {kw}: {e}")
return out
Each response includes a citations array with the structure:
{
"engine": "claude",
"model": "claude-sonnet-4-6",
"query": "cheapest serp api",
"response_text": "Several SERP APIs compete on price ...",
"citations": [
{"position": 1, "url": "https://apiserpent.com/", "title": "Serpent API", "domain": "apiserpent.com"},
{"position": 2, "url": "https://example.com/article", "title": "...", "domain": "example.com"}
]
}
Step 4 — Store results in SQLite
SQLite is plenty for this volume. Create storage.py:
import sqlite3, json, datetime
DB_PATH = "citations.db"
SCHEMA = """
CREATE TABLE IF NOT EXISTS runs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
started_at TEXT NOT NULL,
finished_at TEXT,
keyword_count INTEGER NOT NULL
);
CREATE TABLE IF NOT EXISTS citations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
run_id INTEGER NOT NULL,
engine TEXT NOT NULL,
keyword TEXT NOT NULL,
position INTEGER,
url TEXT NOT NULL,
title TEXT,
domain TEXT NOT NULL,
FOREIGN KEY (run_id) REFERENCES runs(id)
);
CREATE INDEX IF NOT EXISTS idx_run ON citations(run_id);
CREATE INDEX IF NOT EXISTS idx_kw ON citations(keyword);
CREATE INDEX IF NOT EXISTS idx_dom ON citations(domain);
"""
def get_conn():
conn = sqlite3.connect(DB_PATH)
conn.executescript(SCHEMA)
return conn
def start_run(conn, keyword_count: int) -> int:
cur = conn.execute(
"INSERT INTO runs(started_at, keyword_count) VALUES(?, ?)",
(datetime.datetime.utcnow().isoformat(), keyword_count),
)
conn.commit()
return cur.lastrowid
def save_citations(conn, run_id: int, fetched: list[dict]):
rows = []
for r in fetched:
engine, keyword = r["engine"], r["keyword"]
for c in r["response"].get("citations", []):
rows.append((run_id, engine, keyword, c.get("position"),
c.get("url"), c.get("title"), c.get("domain") or ""))
conn.executemany(
"INSERT INTO citations(run_id, engine, keyword, position, url, title, domain) "
"VALUES(?, ?, ?, ?, ?, ?, ?)", rows)
conn.commit()
def finish_run(conn, run_id: int):
conn.execute("UPDATE runs SET finished_at=? WHERE id=?",
(datetime.datetime.utcnow().isoformat(), run_id))
conn.commit()
Step 5 — Generate the weekly diff report
The reporter answers four questions: which keywords gained citations, which lost citations, what is your domain's share-of-voice, and how the four engines compare. Create reporter.py:
import sqlite3
from collections import Counter, defaultdict
from tabulate import tabulate
def latest_two_runs(conn):
cur = conn.execute(
"SELECT id FROM runs WHERE finished_at IS NOT NULL ORDER BY id DESC LIMIT 2")
ids = [r[0] for r in cur.fetchall()]
return ids[1] if len(ids) == 2 else None, ids[0]
def domain_count(conn, run_id: int, domain: str) -> Counter:
cur = conn.execute(
"SELECT keyword, engine FROM citations WHERE run_id=? AND domain=?",
(run_id, domain))
return Counter(cur.fetchall())
def share_of_voice(conn, run_id: int, our_domain: str) -> float:
cur = conn.execute("SELECT COUNT(*) FROM citations WHERE run_id=?", (run_id,))
total = cur.fetchone()[0] or 1
cur = conn.execute(
"SELECT COUNT(*) FROM citations WHERE run_id=? AND domain=?",
(run_id, our_domain))
ours = cur.fetchone()[0]
return round(100.0 * ours / total, 2)
def by_engine(conn, run_id: int, our_domain: str) -> dict[str, int]:
cur = conn.execute(
"SELECT engine, COUNT(*) FROM citations "
"WHERE run_id=? AND domain=? GROUP BY engine", (run_id, our_domain))
return dict(cur.fetchall())
def diff_report(our_domain: str) -> str:
conn = sqlite3.connect("citations.db")
prev, latest = latest_two_runs(conn)
if prev is None:
return f"# Citation report\n\nFirst run completed. Latest run id={latest}. Re-run next week to see diffs."
prev_set = set(domain_count(conn, prev, our_domain))
latest_set = set(domain_count(conn, latest, our_domain))
gained = sorted(latest_set - prev_set)
lost = sorted(prev_set - latest_set)
sov_now = share_of_voice(conn, latest, our_domain)
sov_prev = share_of_voice(conn, prev, our_domain)
eng_now = by_engine(conn, latest, our_domain)
lines = [
f"# AI Citation Report — run {latest}",
"",
f"**Share of voice:** {sov_now}% (prev {sov_prev}%, delta {sov_now - sov_prev:+.2f})",
"",
"## Citations by engine (latest run)",
"",
tabulate([[e, eng_now.get(e, 0)] for e in ("claude", "chatgpt", "gemini", "perplexity")],
headers=["Engine", "Citations"], tablefmt="github"),
"",
f"## Gained ({len(gained)})",
"", *[f"- `{kw}` — {engine}" for kw, engine in gained] or ["_None._"],
"",
f"## Lost ({len(lost)})",
"", *[f"- `{kw}` — {engine}" for kw, engine in lost] or ["_None._"],
]
return "\n".join(lines)
Step 6 — Schedule with cron
Wire the pieces together in main.py:
import sys, yaml, datetime
from dotenv import load_dotenv
load_dotenv()
from fetcher import fetch_all
from storage import get_conn, start_run, save_citations, finish_run
from reporter import diff_report
def run():
cfg = yaml.safe_load(open("watchlist.yaml"))
keywords = sum((cfg.get(g, []) for g in ("brand", "category")), [])
our_domain = cfg["domains"]["own"][0]
print(f"Fetching {len(keywords)} keywords across 4 engines...")
fetched = fetch_all(keywords)
conn = get_conn()
run_id = start_run(conn, len(keywords))
save_citations(conn, run_id, fetched)
finish_run(conn, run_id)
report = diff_report(our_domain)
out = f"reports/weekly-{datetime.date.today().isoformat()}.md"
open(out, "w").write(report)
print(f"Report written to {out}")
if __name__ == "__main__":
sys.exit(run())
Schedule weekly on Monday morning:
mkdir reports
crontab -e
# add:
0 8 * * 1 cd /home/you/ai-citation-tracker && .venv/bin/python main.py >> cron.log 2>&1
Step 7 — Optional: Slack alerts
Append the report to a Slack channel:
import os, requests
def post_to_slack(report: str):
url = os.environ.get("SLACK_WEBHOOK_URL")
if not url:
return
requests.post(url, json={"text": report[:38000]}) # Slack limit
Call post_to_slack(report) at the end of main.py's run() function.
Cost Analysis
The Serpent AI Ranking API is priced per request and includes the upstream LLM costs. Per-engine, per-query rates at the Scale tier ($500+ single deposit) are roughly $0.001 per call. Math for a typical setup:
- 50 keywords × 4 engines × 4 weeks = 800 calls/month
- 800 × $0.001 = $0.80/month
Or 200 keywords across 4 engines weekly = ~3,200 calls/month = ~$3.20. Even running daily on 50 keywords is roughly $5.60/month.
For comparison context: a single rank-tracker subscription that does not include AI citation tracking typically costs $30 to $200/month. See the full Serpent pricing page.
Extending the Tracker
Once the basic tracker is live, three high-impact extensions:
Add Google AI Overview tracking
Use the Google SERP API to fetch AI Overview source lists for the same keywords. Store them in a parallel table. Now you have visibility across five AI surfaces (Google AI Overview + 4 LLMs).
Add a competitor watchlist
Add domains under watchlist.yaml > domains > competitors. The reporter already supports any domain — just call share_of_voice(conn, run_id, competitor_domain) for each. You get a citation share-of-voice leaderboard.
Build a small dashboard
SQLite can be queried directly by tools like Datasette, Metabase, or Superset. Point your tool at citations.db and you have a real-time dashboard with no extra ETL. Pair with our brand SERP monitoring guide for the broader visibility view.
Get the full code as a starter repo. Sign up for a free Serpent API key and you can run this tracker today — the Scale tier costs less than a single rank-tracker subscription per year. Get 10 free queries →
FAQ
What is an AI citation tracker?
A periodic script that queries ChatGPT, Gemini, Claude, and Perplexity for your target keywords, captures which URLs each LLM cites, stores the results, and reports changes over time. The AEO equivalent of a rank tracker.
How often should I run the tracker?
Weekly is enough for most brands. Daily for fast-moving topics or news. Serpent AI Ranking is priced per request so cadence is your choice.
How many keywords should I track?
Start with 50 to 100 strategic keywords: brand terms, product names, top-of-funnel category keywords, and bottom-of-funnel comparison queries.
Can I track competitors with the same setup?
Yes. The script accepts any list of domains. Add competitor domains under domains.competitors and you get share-of-voice for each.
What does it cost to run?
Each AI Ranking call queries one engine and costs roughly $0.001 at Scale. Tracking 100 keywords across 4 engines weekly is around $1.60/month.
Start Tracking Citations This Week
Serpent API gives you AI Ranking across ChatGPT, Gemini, Claude, and Perplexity in one API key, plus Google AI Overview source extraction in the SERP API. The cheapest Google SERP API in the world — from $0.03 per 10,000 pages, 10 free searches included.
Get Your Free API KeyExplore: AI Ranking API · SERP API · Pricing · Try in Playground

