Sai Kumar Pakalapati
Gen AI Engineer | Python Developer
AI Engineer with 1+ years of professional experience in developing practical AI-driven applications. Skilled in Python, FastAPI, LangChain, and containerized deployments with Docker. Experience includes building RAG pipelines, question generation systems, and real-time voice assistants. Focused on delivering scalable, maintainable, and business-aligned AI solutions with measurable impact.
Technical Skills
Professional Experience
Marsh
AI Engineer
Nov 2025 – Present
Phoenix, AZ
- Building RAG (Retrieval-Augmented Generation) application for internal use, empowering fast and accurate access to company knowledge.
- Created robust knowledge base from structured and unstructured data sources, improving team productivity and information retrieval.
- Designed and implemented NLP-to-SQL pipeline, enabling conversational queries for internal databases.
- Developed AI system for generating real-time financial summaries, enhancing business decision-making and reporting.
- Integrated advanced NLP models for automated document analysis and insights extraction across financial reports.
- Collaborating with cross-department stakeholders to align AI solutions with business objectives, boosting innovation and efficiency.
Verzion
AI Engineer
Oct 2025 – Nov 2025
Irving, TX
- Created agentic playbooks for interns to reference and build agentic workflows, streamlining onboarding and implementation.
- Deployed internal agents for agent harness, improving automation of routine business tasks and data processing.
- Designed modular framework for agentic workflows, enabling rapid prototyping and team collaboration.
- Automated recurring project management tasks with custom-built AI agents, reducing manual effort across teams.
- Led workshops to enable engineering interns in building and deploying agentic systems quickly and effectively.
- Developed monitoring tool to track agent performance and workflow success rates, optimizing internal operations.
Voxglobaltech
Gen AI Engineer
Jul 2025 – Oct 2025
Dallas, TX
- Built and maintained the open-source project Question Bank Generator, architecting a scalable pipeline for generating and validating 50,000+ MCQs using local LLMs via Ollama.
- Developed an end-to-end ingestion and processing workflow leveraging AWS S3 for PDF intake, semantic deduplication for uniqueness, and MongoDB for structured storage.
- Implemented an advanced validation pipeline using gpt-oss-20b to ensure factual accuracy, boosting correctness to >99%.
- Containerized the system using Docker and deployed services with FastAPI, ensuring scalability, maintainability, and cost efficiency.
- Led cross-functional efforts to integrate the question-bank system into LearnKidz Academy's platform, supporting seamless UI consumption through a React frontend.
Vensolutions
Python Developer
Jun 2024 – Jun 2025
Dallas, TX
- Developed Python-based internal tools and microservices with limited AI integration, focusing on productivity and workflow automation.
- Designed and implemented an internal Document RAG system for knowledge retrieval across company documentation, improving employee efficiency.
- Built a personal AI assistant prototype using FastAPI and LangChain concepts to support internal project management and task automation.
- Developed and deployed multiple FastAPI-based internal services to handle company-specific workflows and data pipelines, ensuring high performance and maintainability.
- Led the creation of the open-source project DocuQuery-AI, a conversational RAG engine enabling efficient private document search and summarization.
- Collaborated with cross-team stakeholders to align AI services with business objectives, improving internal document accessibility and productivity.
Key Projects
Question Bank Generator
2025Engineered a cost-effective, end-to-end pipeline to process PDFs from S3, generate MCQs using local LLMs (Ollama), added semantic deduplication, and validated with gpt-oss-20b to reach >99% accuracy. Multi-tenant input routing with pre-processing (OCR + page splitting) and retry-safe workers. Exposes REST endpoints for bulk uploads and a React UI for review/approval.
DocuQuery-AI
2025Conversational RAG engine for private document search and summarization with unified ingestion and FAISS-based retrieval. Implements smart chunking, metadata filters, and response grounding with source citations. Role-based access with per-namespace indices and low-latency caches.
Real-Time Voice Agent
2025Real-time voice assistant using Whisper for STT and Gemini for reasoning with sub-250ms latency via async orchestration. Streaming transcription with partial results, interruption handling, and TTS fallback pathways. Connection health checks and auto-reconnect for stable sessions.
AutoPR-SummarizeCode
2025GitHub App agent that parses PR diffs, runs LLM prompts, and posts structured summaries to reduce reviewer time by ~30%. Adds risk level, breaking-change flags, and test-impact notes. Supports label automation and configurable prompt presets per repo.
Personal AI Assistant Dashboard
2025Web-based AI productivity assistant with modules for Chat, Summarization, Email, Code Helper, and Translation using realtime WebSockets. Modular plugin architecture, persistent threads, and exportable transcripts. Rate-limit safe queues with optimistic UI updates.
Education
Master of Science in Information Technology & Management
Belhaven University — Jackson, MS, USA
Bachelor of Technology
Jawaharlal Nehru Technological University — Hyderabad, India