Building out a content quality review workflow and trying to figure out how much weight to give detection scores as one of several signals.
Current thinking: detection scores alone are not a reliable basis for any high-stakes decision. The false positive rate on human-written content is too variable across tools, and a single score from a single tool tells you almost nothing about the actual provenance of a piece.
What I’m less sure about is how to use them usefully. A few questions I’m working through:
Is there a meaningful signal when multiple tools independently flag the same piece at high scores? Or do they share enough underlying methodology that they’d all be wrong in the same direction?
Is there a score threshold below which you can reasonably treat a piece as human-written for workflow purposes, even if it’s not a certainty?
How do you handle the asymmetry between false positives and false negatives in a professional context where both errors have real costs?
We’ve already pressure-tested one tool against a sample of known human-written content from our team and got a 15-20% false positive rate on certain writers whose style happens to be more structured and formal. That’s not usable for anything consequential.
Genuinely curious how others are building detection into workflows where the decisions matter — not academic curiosity, actual process design.