Forensic AI: Debugging Hallucinations with Delta Time Travel
Every AI engineer eventually runs into the “Heisenbug.” It usually starts with an urgent ticket from Compliance: “Yesterday at 2:00 PM, the chatbot gave terrible financial advice to a VIP client.” You jump into the logs, find the user’s question, and run it through the system again. Perfect answer. You try again. Still perfect. You change a few settings. Perfect again. So why can’t you reproduce the failure? Because the data moved. Between yesterday at 2:00 PM and today, the underlying knowledge base likely changed. A document was edited, a row was deleted, or the vector index was refreshed. The “world” the AI saw yesterday no longer exists. And if you can’t reproduce the state of the world, you can’t fix the bug. This is exactly why we need Forensic AI—the ability to freeze time, replay history, and debug incidents with evidence instead of guesswork. Here’s how to design reproducible RAG on Databricks using MLflow Tracing and Delta Lake Time Travel .