Documentation

A2RAG is a decision layer that sits between your RAG retrieval and your users. It receives a query, retrieved contexts, and a draft answer — and returns a routing decision: answer, clarify, or abstain.

Quickstart

Get from zero to your first decision in under 5 minutes.

# Install
pip install a2rag

# Run your first decision
from a2rag import A2RAGClient

client = A2RAGClient(api_key="your_key_here")

decision = client.decide(
    query="What is the refund window?",
    contexts=["Refunds within 14 days for unused items."],
    draft_answer="14 days.",
)

print(decision.action)       # "answer"
print(decision.confidence)   # 0.82
print(decision.should_answer) # True

Installation

pip install a2rag

Requires Python 3.8+. No required dependencies beyond the standard library for the client package.

Authentication

All API requests require an API key passed as a header:

X-API-Key: a2rag_pilot_...

Get your key: Request early access → Developer keys are issued automatically. Pilot keys within 24 hours.

POST /decide

The core endpoint. Evaluates a query against retrieved contexts and returns a routing decision.

POST https://api.a2rag.ai/decide
Content-Type: application/json
X-API-Key: your_key

Request body

FieldTypeRequiredDescription
querystringrequiredThe user's original question
contextsstring[]requiredRetrieved chunks from your RAG system
draft_answerstringrequiredThe LLM-generated draft answer
domainstringoptionalinsurance · legal · hr · support · medical
tau_evidencefloatoptionalOverride evidence threshold (0.0–1.0)
tau_completenessfloatoptionalOverride completeness threshold

Response

Response fields
action
"answer" | "clarify" | "abstain"
The routing decision
confidence
float 0.0–1.0
How confident A2RAG is in this decision
explanation
string
Human-readable reason for the decision
clarification
string | null
Follow-up question (only when action=clarify)
signals
object
Evidence signals: coverage, retrieval_strength, consistency, confidence
decision_id
UUID
Unique ID — use for feedback submission
latency_ms
integer
Processing time in milliseconds
{
  "decision_id": "5bbc7903-2250-4836-b0f5...",
  "action": "answer",
  "confidence": 0.82,
  "explanation": "Answer supported by evidence (91% of claims verified).",
  "clarification": null,
  "signals": {
    "coverage": 0.91,
    "retrieval_strength": 1.0,
    "consistency": 1.0,
    "confidence": 0.82
  },
  "latency_ms": 12
}

POST /feedback

Submit feedback on a decision. Used to improve calibration over time.

client.feedback(
    decision_id="5bbc7903-...",
    was_correct=True,   # or False
    comment="Correct — user confirmed refund"
)

GET /health

{
  "ok": true,
  "version": "8.0.0",
  "nli_loaded": true,
  "embed_loaded": true
}

Actions

ActionMeaningWhat to do
answerCorpus supports the draft. Safe to show.Display draft_answer to user
clarifyInfo exists but query is instance-specific.Ask user decision.clarification
abstainTopic not covered in corpus.Escalate, custom message, or webhook

Evidence Signals

SignalRangeDescription
coverage0.0–1.0How much of the draft is supported by retrieved contexts
retrieval_strength0.0–1.0Average confidence of retrieved chunks
consistency0.0–1.0Whether contexts agree with each other (1.0 = consistent)
completeness0.0–1.0Whether query can be answered without instance-specific data

Domain Presets

Pass domain to use calibrated thresholds for your industry.

ValueUse caseBehavior
insuranceCoverage, claims, policiesConservative — high abstain rate
legalContracts, compliance, NDAVery conservative
medicalClinical, drug, treatmentVery conservative
hrLeave, benefits, policyModerate
supportProduct, billing, SLAModerate

Languages

A2RAG detects language automatically. No configuration needed.

LanguageModelAccuracy
EnglishMS-MARCO cross-encoder94%
Hebrewmultilingual-e5-base85–100%
Arabicmultilingual-e5-base~85%
French / SpanishMS-MARCO~90%

LangChain Integration

from langchain.chains import RetrievalQA
from a2rag import A2RAGClient

client = A2RAGClient(api_key="your_key")
chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)

def safe_answer(query):
    result = chain.run(query)
    contexts = retriever.get_relevant_documents(query)
    ctx_texts = [doc.page_content for doc in contexts]

    decision = client.decide(
        query=query,
        contexts=ctx_texts,
        draft_answer=result,
    )

    if decision.should_answer:
        return result
    elif decision.should_clarify:
        return decision.clarification
    else:
        return "I don't have enough information to answer that."

Error Codes

CodeMeaningFix
401Invalid or missing API keyCheck X-API-Key header
403Key suspended or revokedContact [email protected]
422Invalid request bodyCheck required fields: query, contexts, draft_answer
429Monthly limit reachedUpgrade plan or request extension
500Internal server errorRetry. If persists, contact support
503Service temporarily unavailableCheck api.a2rag.ai/health and retry