Building prompt-vett
I just started building a tool called prompt-vett that evaluates LLM system prompts. You paste a prompt and some test messages, hit Evaluate, and get back a structured report covering quality, consistency, injection resistance, and edge case handling. The tool also returns a rewritten version of the prompt to work from, not just a list of things to fix.
This is my first build-in-public project, and I'm writing about it as I build for two reasons.
The code will be open source, but the decisions behind it are what I'm really documenting. If you're learning to build with AI, those decisions might be useful to you too.
The tool also came out of real experience. Building flo., the chatbot I shipped at flowsyn.ai, meant spending a lot of time getting the system prompt right. Personality, scope, security, tone. It was surprisingly involved, and if you're building something with an LLM for the first time, prompt-vett is what I wished I had.
What's coming
Future posts will go deep on the decisions that shaped this build: the architecture iteration, pipeline design, and security model. Most importantly, how prompt-vett evaluates itself, because if I'm claiming to evaluate prompts I'd better be able to evaluate my evaluator.
In addition to bookmarking this blog, star the repo at github.com/bobgaynor/prompt-vett to follow along.