Live · Production·Sales Agent AI · Live, US Client

BrokerSMS

Internal dealer tool — not publicly accessible

ClientUS Car Dealership

Year2025 — Present

RoleFull Stack AI Engineer

StatusLive · Production

Active brokers150+

Messages processed20K+

Concurrent sessions50–100+

Cached response time<100ms

About the Project

BrokerSMS is a US based company that connects independent auto brokers to dealership inventory, mostly Toyota and Lexus, and lets them quote, negotiate, and close deals for their retail customers. The pain point they came to me with was brutal. Their sales reps were getting hundreds of SMS messages a day from brokers asking about vehicle availability, lease terms, monthly payments, and out the door pricing. Each reply required pulling lender rate sheets, running payment calculations by hand, and checking inventory spreadsheets. Brokers hated the wait. Reps burned out. Deals slipped.

The company wanted an AI agent that could take over the entire sales conversation. Not a chatbot template, but something that could actually compute a precise TFS lease at 36 months with $2,000 down and reply in under a second. I built that using a RAG pipeline — the agent retrieves live inventory, lender rate sheets, and dealer-specific promo rules from MongoDB before every response, so GPT-4o is always grounded in real data instead of hallucinating prices. It's been live for months now, serving 150+ active brokers, handling 50 to 100+ concurrent conversations per instance, and it's processed over 20,000 real sales messages end to end. Every time a broker texts "what's the payment on that 2024 RX350 with 10k down," the agent retrieves the exact VIN from the inventory store, pulls the matching rate sheet, runs the real TFS, LFS, ALLY, and RIZE math, and answers in well under a second. Working with the BrokerSMS team on rate sheet edge cases and dealer specific promo logic was the most commercially serious work I've done. You learn fast when a wrong answer costs someone a $45k deal.

What I Built

▸RAG pipeline — retrieves inventory, rate sheets, and promo rules before every GPT-4o response
▸LangChain orchestration with intent classification, entity extraction, and structured output
▸Config-driven multi-brand inventory (Toyota + Lexus) with O(1) VIN lookups
▸4 lender rate sheets integrated with iterative convergence
▸Three-tier caching strategy — Redis 7 → Python dict → in-memory
▸Redis-backed job queue for follow-up reminders
▸Twilio SMS delivery, Docker Compose deployment

System Design

The backend is Python/Flask with Gunicorn, using LangChain to orchestrate a RAG pipeline with GPT-4o. On every inbound message, the agent retrieves relevant inventory, rate sheets, and dealer promo rules from MongoDB before generating a response — ensuring every answer is grounded in real data. A config-driven multi-brand inventory engine (Toyota + Lexus) syncs from Google Sheets API into MongoDB with O(1) VIN/stock lookups via hash indexing. The payment calculation engine integrates 4 lender rate sheets (TFS, LFS, ALLY, RIZE) with iterative convergence to match dealer pricing in real time. A three-tier caching strategy (Redis 7 → Python dict → in-memory fallback) serves pre-computed lease data with zero LLM calls and sub-100ms response times. Redis also backs session management and a lightweight job queue for follow-up reminders. Twilio handles SMS delivery. The admin dashboard is a Next.js app with real-time chat monitoring, broker management, and broadcast messaging. Deployed via Docker Compose on a Linux VPS.

Tech Stack

Python
Flask
LangChain
GPT-4o
Twilio
Redis
MongoDB
Docker
Next.js

Gallery

Next Project

Prepwise

EdTech AI