Overview
R&D engineers at CoorsTek work with a mix of structured database records and hundreds of technical documents — test reports, material specs, research notes. Finding information across both required knowing where to look and how to query it, which was slow and dependent on institutional knowledge.
This project built an agentic chatbot that lets engineers ask questions in plain language and get answers drawn from both the structured database and the unstructured document corpus — deployed on Databricks and available company-wide. I pioneered the use of Databricks at CoorsTek through earlier experimentation in Internship 2, then took the system from zero to first deployment in 2 months during Internship 3.
What I Built
Data Ingestion Pipeline
Built ETL pipelines to ingest data from two sources: a structured R&D database and hundreds of technical documents. Structured records were cleaned, normalized, and indexed into a Databricks vector search table. Documents were chunked, embedded, and indexed separately — with metadata preserved so agents could filter by document type, date, or project.
Multi-Agent Framework
Developed a framework of AI agents using Databricks Agent Bricks to handle retrieval from both structured and unstructured sources. Each agent is responsible for a specific retrieval task; an orchestration layer routes user queries to the appropriate agent and synthesizes responses.
Reliability Engineering
Improved the reliability of multiple agents to 95+% through a combination of data cleaning (removing duplicate chunks, fixing metadata inconsistencies) and prompt engineering (clearer tool descriptions, stricter output formatting constraints, few-shot examples). Established LLMOps evaluation procedures — scoring agent responses against a labeled test set — as a standard practice for future agent development at the company.