PHASE 5 AI Integration: The "Just Connect an API" That Took 700+ Lines of Code
Well, this is the second-to-last part of my MVP. In this, I just integrated the Groq API for now.
API? Should it be easy then?
No, not quite. At first, I thought: just connect the API, send a prompt, get a response, and boom , done! I was so naive. π
Chapter 1: The Innocent Beginning
I started simple. Install the Groq Python SDK, grab my API key, write a basic function:
from groq import Groq
client = Groq(api_key=GROQ_API_KEY)
response = client.chat.completions.create(
messages=[{"role": "user", "content": "Tell me about NUST"}],
model="llama-3.1-8b-instant"
)
It worked! The AI responded! Ship it, right? Wrong.
Chapter 2: The Real-World Wake-Up Call
Then reality hit me in the face:
Problem #1: Students ask about specific universities, programs, and fees. The AI was making things up (hallucinating) because it didn't have access to my actual university database.
Problem #2: Users expected real-time streaming responses like ChatGPT, not waiting 10 seconds for a full response.
Problem #3: The AI had no memory. Ask about "NUST admission" in one message, then "What about the fee?" β and it forgot what university we were talking about.
Problem #4: Sometimes the AI would try to "pretend" it was calling functions by writing <function=search_universities>{...}</function> in the text instead of using actual tool calls!
Chapter 3: Building the Beast (700 Lines Later)
So I rolled up my sleeves and built a custom AI agent system from scratch. No LangChain, no shortcuts β pure engineering:
π§ Function Calling (Tool Use)
Gave the AI two superpowers:
search_universities()β Query my Supabase database for real Pakistani university databrave_search()β Search the web when local data isn't enough
tools = [
{
"type": "function",
"function": {
"name": "search_universities",
"description": "Search Pakistani universities by location, program, or fees",
"parameters": UniversitySearchArgs.model_json_schema()
}
}
]
Now when a student asks "Which universities in Islamabad offer CS?", the AI actually queries my database instead of guessing!
β‘ Server-Sent Events (SSE) Streaming
Built a custom streaming system so users see responses word-by-word in real-time:
async def stream_generator():
for chunk in response:
yield f"data: {chunk.choices[0].delta.content}\n\n"
return StreamingResponse(stream_generator(), media_type="text/event-stream")
It felt like magic the first time I saw text appearing live on the frontend! β¨
π§ Conversation Memory
Implemented a smart memory system:
Store every message in Supabase with
conversation_idExtract important context (student preferences, mentioned universities)
Summarize old conversations automatically
Load relevant context before each new message
Now the AI remembers: "Oh, you asked about engineering earlier!"
π¨ The Hallucination Interceptor
This was the wildest bug I encountered. Sometimes Llama would write:
<function=search_universities>{"location":"Lahore"}</function>
Let me search for you...
Instead of actually calling the function! So I built a regex-based "hallucination detector":
if "<function" in content_text:
matches = re.findall(r"<function=([^>]+)>(.*?)</function>", content_text)
# Convert fake function calls into real tool calls
mock_tool_calls = [...]
# Strip the hallucination from the response
cleaned_content = re.sub(pattern, "", content_text)
Crisis averted! π‘οΈ
π Message Serialization
Groq's API is picky about message formats. One wrong field (like a Pydantic internal field leaking in) and boom β 400 Bad Request. Built a custom serializer to whitelist only valid fields:
def serialize_messages_for_groq(messages):
# Strip Pydantic internals, validate required fields,
# ensure tool_calls have 'type', etc.
Chapter 4: The Aha Moment
After 2 weeks of debugging, refactoring, and head-scratching, it finally clicked together:
User asks: "Tell me about COMSATS Islamabad"
AI decides: "I need to search the database"
Tool call triggered β Queries Supabase β Returns real data
AI gets results: "COMSATS Islamabad has 15 programs, fees range from..."
Streams response word-by-word to frontend
Saves conversation + updates memory
Next message: AI remembers we're talking about COMSATS!
The system was alive. π€π
The Stack Breakdown:
| Component | Technology | Lines of Code |
|---|---|---|
| Core AI Client | Groq SDK (Llama 3.3 70B) | ~54 |
| Streaming Endpoint | FastAPI + SSE | ~450 |
| Conversation Memory | Supabase + Custom Logic | ~150 |
| Tool Definitions | JSON Schema | ~35 |
| Utilities | Custom Helpers | ~56 |
| Total | Pure Python | ~710 |
What I Learned:
"Just integrate an API" is a lie. Production AI systems need tools, memory, error handling, streaming, and hallucination detection.
LangChain wasn't needed. I almost added it, but building from scratch gave me way more control and understanding.
Groq is insanely fast (800+ tokens/sec), but you still need smart architecture to make it feel instant.
LLMs will surprise you β in both good (clever reasoning) and bad (creative hallucinations) ways.
The Result:
Students now have an AI counselor that:
β Knows real Pakistani university data
β Streams responses in real-time
β Remembers conversation context
β Can search the web for additional info
β Doesn't make up fake statistics
And all from what started as: "I'll just call an API..." π
Next Up: The final piece of the puzzle, Frontend integration and deployment. Stay tuned!
Stats:
Model: Llama 3.3 70B via Groq
Response Speed: 800-1000 tokens/sec
Architecture: Custom RAG + Function Calling
Dependencies Removed: LangChain (didn't need it!)
Coffee Consumed: Too much β
What's your experience building with AI APIs? Did you also think it would be "just 5 lines of code"? Drop your stories in the comments! π


