05/26/2026
I recently wrote a blog post called "Crossing the Line" about something that has been bothering me: Can Agentic AI really be tested?
Traditional software crosses a line from testing to production. That line is called a release. Before release, we test. After release, users experience the real thing.
But Agentic AI seems much different.
Agentic AI is being "released" with limited (or no) pre-release testing because its behaviors can change, adapt, and emerge in ways we may not fully predict.
Then, once the agent is taking actions in the real world, we often aren't really testing anymore. We're monitoring. And monitoring is not the same as testing.
By the time an AI agent makes a bad decision, leaks information, takes an unexpected action, or creates business risk, the "test" has already become reality. This is not speculation. These things have already happened, some with major consequences.
So where is the line now?
Can we move testing approaches forward enough to deal with Agentic AI, or are we accepting a world where production becomes the experiment? What defines a "defect" versus a "behavior"?
I explore this question in the newest episode of The Value of Testing podcast.
https://youtu.be/gZTvgCEj10s
I'd love to hear your thoughts: Can Agentic AI really be tested, or are we entering an era where we can only manage risk and monitor outcomes?
I received some great comments after the blog post, but I'm still gathering input and opinions on this.
There is a line that exists not only in software, but also in other domains such as writing, music, and movies, just to name a few. That line is the release ...