
BrokerSMS is a US based company that connects independent auto brokers to dealership inventory, mostly Toyota and Lexus, and lets them quote, negotiate, and close deals for their retail customers. The pain point they came to me with was brutal. Their sales reps were getting hundreds of SMS messages a day from brokers asking about vehicle availability, lease terms, monthly payments, and out the door pricing. Each reply required pulling lender rate sheets, running payment calculations by hand, and checking inventory spreadsheets. Brokers hated the wait. Reps burned out. Deals slipped.
The company wanted an AI agent that could take over the entire sales conversation. Not a chatbot template, but something that could actually compute a precise TFS lease at 36 months with $2,000 down and reply in under a second. I built that using a RAG pipeline — the agent retrieves live inventory, lender rate sheets, and dealer-specific promo rules from MongoDB before every response, so GPT-4o is always grounded in real data instead of hallucinating prices. It's been live for months now, serving 150+ active brokers, handling 50 to 100+ concurrent conversations per instance, and it's processed over 20,000 real sales messages end to end. Every time a broker texts "what's the payment on that 2024 RX350 with 10k down," the agent retrieves the exact VIN from the inventory store, pulls the matching rate sheet, runs the real TFS, LFS, ALLY, and RIZE math, and answers in well under a second. Working with the BrokerSMS team on rate sheet edge cases and dealer specific promo logic was the most commercially serious work I've done. You learn fast when a wrong answer costs someone a $45k deal.
The backend is Python/Flask with Gunicorn, using LangChain to orchestrate a RAG pipeline with GPT-4o. On every inbound message, the agent retrieves relevant inventory, rate sheets, and dealer promo rules from MongoDB before generating a response — ensuring every answer is grounded in real data. A config-driven multi-brand inventory engine (Toyota + Lexus) syncs from Google Sheets API into MongoDB with O(1) VIN/stock lookups via hash indexing. The payment calculation engine integrates 4 lender rate sheets (TFS, LFS, ALLY, RIZE) with iterative convergence to match dealer pricing in real time. A three-tier caching strategy (Redis 7 → Python dict → in-memory fallback) serves pre-computed lease data with zero LLM calls and sub-100ms response times. Redis also backs session management and a lightweight job queue for follow-up reminders. Twilio handles SMS delivery. The admin dashboard is a Next.js app with real-time chat monitoring, broker management, and broadcast messaging. Deployed via Docker Compose on a Linux VPS.