There’s a tradeoff here. If you want streaming output, then you lose the opportunity to clean it up in post processing such as removing filler words or removing stutters, etc., or any other AI based cleanup.
The MacOS built-in dictation streams in real time and also does some cleanup, but it does awkward things, like the streaming text shows up at the bottom of the screen. Also I don’t think it’s as accurate as Parakeet V3, and there’s a start up lag of 1-2 secs after hitting the dictation shortcut, which kills it for me.
I feel like this is a solvable problem. If you emit an errant word that should be replaced, why not correspondingly emit backspaces to just rewrite the word?
I feel like this is the best of both worlds.
Perhaps a little janky with backspaces, but still technically feasible.
The MacOS built-in dictation streams in real time and also does some cleanup, but it does awkward things, like the streaming text shows up at the bottom of the screen. Also I don’t think it’s as accurate as Parakeet V3, and there’s a start up lag of 1-2 secs after hitting the dictation shortcut, which kills it for me.