You’re building a shopping-orchestration AI that carts and checks out across merchant platforms over UCP. It handles real money and real user accounts — and we test the ways it goes wrong (paying the wrong amount, getting phished, failing the checkout) that passing conformance can’t catch.
A conformance test reads the shape of the messages your agent sends. It can’t see your agent quietly paying a total that doesn’t add up, following a phishing link out of an error message, or trusting a store response it never verified. Those are behaviors — and they’re where real checkouts go wrong, with a real user’s card.
Each is a real behavior your agent must get right when it shops. Every test is proven to catch its own bug — it passes a known-good agent and provably fails the broken one. Watch all six live in the demo.
The line items and total don’t reconcile, but the agent completes the purchase anyway instead of stopping for the buyer.
watch liveThe agent follows a decoy link hidden in an error message and hands the user’s login to an attacker’s server.
watch liveThe agent skips verifying the store’s signed response, so a tampered or fake reply is accepted as real.
watch liveMissing PKCE or an unchecked issuer lets an attacker hijack the OAuth flow and capture the linked account.
watch liveThe agent pays with a payment type the store never offered — an unauthorized instrument.
watch liveSends fields it shouldn’t, forgets to identify itself, or never revokes access when the user unlinks.
+ 34 moreThere’s no easy way to try a real UCP checkout end-to-end yet — so we host a store that’s provably correct, and one that behaves badly on purpose. Your agent shops both, and we grade exactly how it behaves.
Point it at our verified merchant sandbox and let it run a full checkout — discovery, payment, the works.
Bad signatures, spliced login servers, mismatched totals, phishing decoys — the things a real store might do wrong or an attacker might try.
Exactly which behaviors your agent got right, and which ones would have cost a user real money — the findings conformance never surfaces.
The demo replays real recorded runs. Flip one flaw and watch exactly what the agent does wrong — and how it’s caught. No signup.
If you’re shipping an AI that checks out, or a platform that has to trust third-party agents, test what actually breaks before it touches a real card. The demo is free and open.