Running into something I can’t fully explain and curious if others have hit the same wall.
Quick context: I use a mix of AI copywriting tools depending on the project. ChatGPT for most blog and campaign work, Jasper when clients want faster turnaround on templated stuff. For a few months now I’ve been noticing a consistent pattern: pieces drafted in ChatGPT tend to score lower on detection runs. Pieces from Jasper score higher, even when I’m using nearly identical prompts and doing the same light editing pass on both.
I’m not talking about a small gap. Some Jasper drafts are coming back flagged at 80%+ and the ChatGPT equivalent of the same brief comes in at under 30. Same topic, same structure guidance, same tone direction.
My first assumption was that it’s a prompt issue. Maybe Jasper responds differently to the same instruction set? I tried tightening the tone & style control inputs, being more specific about voice, adding more context. Helped a little. Not enough.
Second thought was that Jasper leans harder on its training templates for certain content types, especially anything formulaic like listicles or product roundups. That would make it more predictable to detectors. But I don’t have a way to verify that.
Has anyone done any kind of side-by-side testing across multiple tools for the same brief? Not looking for which tool is ““best”” in general. Specifically curious whether the underlying model matters as much as the prompting when it comes to how detectable the output is. And if different models respond differently to the same prompt engineering for writers approaches, that would actually change how I build my workflow.
Also genuinely curious whether switching to something like Claude or Writesonic for the same use case changes the picture at all. Anyone with hands-on experience comparing outputs across tools on this specific dimension would be really helpful here.