The znver1/znver2 overrides shouldn't need 2uops for the xmm case (but znver1 should double-pump for the ymm case). Found with the help of D138359