Does heavily editing AI output actually change the detection score or just the surface-level phrasing?

testing something and want to compare notes

i’ve been running experiments on my own drafts. i take AI output and edit it heavily, restructure paragraphs, change examples, rewrite the argument in places. then i run both versions through detectors to see if the score changes.

my findings so far: surface edits (word swaps, minor rephrasing) move the score very little. structural changes (reordering paragraphs, changing the logical flow) move it more. rewriting the opening substantially moves it the most.

anyone else tested this? curious if others have seen the same pattern or different results.