Now I'm still waiting for someone to succeed at a clean-room recreation of Majel Barrett's voice, so we can finally have computers sound like they always should have.
We could've been there a decade ago, but the high-quality audio samples, made officially and specifically with possibility of this use in mind, got trapped somewhere between the estate, producers, and a commercial interest that called dibs, and then procrastinated on the project instead.
I did this. She recorded clean (imo, i cleaned it up) audio for “Star Trek: The Next Generation Interactive Technical Manual” which is available on archive.org.
Nurse Chapel, and "Number One"* from the original series' original pilot, The Cage. Both of these characters are main cast in SNW, sadly no mind-swap plot with these two has happened yet.
* I don't think she had a full name at that point?
I just yeeted a bunch of extremely noisy fragments into elevenlabs, and it came out pretty good on their cheap $5 plan. If you're after this for your own amusement, let me know if you want a screencap, or a dump of the source files.
Obv no clean room reconstruction but good enough for personal use...
I have lots of super high quality, clean audio recordings from her ripped from an old video game that she did voice work for. I've tried various TTS models over the years with it. Getting the pitch and tune is easy, but getting the impersonal detached robot-y feeling is kinda tricky. But I haven't tried in the past 6 months, so maybe it's time to give it another shot.
the inflection and impersonal feel is definitely hard to get right. there are parameters in the elevenlabs API docs to make the voice more stable (= monotonous; see speak.sh in that repo) but still the voice cloner on my $5 plan doesn't really get it right.
nevertheless... i'm still having a lot of fun with this.
edit: if I am forced to rot my brain with the 10x productivity boosting slop gun, at least I'll do it grinning
> pod cleaned up. waiting on the behemoth to finish grinding through Italy.
< if only postgres had progress indicators
... then they coulda called it progresql
> lmaooo
> Bash(~/speak.sh "Joke detected. Humor subroutine engaged. Ha. Ha. Ha.")
“Director John Badham states in the commentary that the actor voicing the raw content that was later modified for the computerized effect was John Wood (the Falken character), reading the script word-for-word in reverse order in order to portray a "flat quality" with limited inflection. That raw audio was then edited and re-assembled after being run through audio processing equipment to achieve the desired effect.”
Apparently John Wood read the lines in reverse order to make the enunciation weird. If you train a model, feed the lines you want in reverse word order, then split on silence and reverse them again, you should come close.
Lol, yea, the scripts are beyond sketchy. This is the new vector, a cool idea masking itself as "fun" (which it is actually fun). People not understanding or vibing may not understand what they're installing. Even if this author isn't malicious, you cannot assume that will always be the case.
The author might not be malicious, but from going through some of the audio packs, they're really not quality-checking PRs. For instance, sc_medic/sounds/WhereDoesItHurt.mp3 sounds like two-and-a-half sounds stuck together ("Critical? You Rang? Please state the nat--", it cuts off right there, and doesn't include the phrase "Where does it hurt?").
I wouldn't use this repo outside of some kind of sandbox.
Plus, the fact that audio/video assets can have RCE zero days quite often on some of these systems should make someone immediately suspicious. It isn't hard to generate those assets on your own in a way you are comfortable with. I would never, ever, ever install this without forking my own assets and doing my own install, but not everyone is me.
I don't think using something fun as an attack vector is anything new at all. It's an easy way to have someone let their guard down because you want to play around and aren't thinking how something silly could actually be out to get you.
At least until General Artificial Creativity (GAC) takes over. But don't worry, it won't kill humans for a greater good of more paperclips, but because it will be.. creative.
So it will enslave us in tricky ways? Like maybe using ways to make technology super addictive, so our entire society changes, and writing algos to control our global discourse on important topics, and, uh, never mind.
Cheaper? I'm confused, how can it be cheaper than free? Most of what LLMs for code rely on is already open source. Also AFAICT (which is trick since numbers aren't public) GenAI is some of the most expensive use cases and those companies (OpenAI, Anthropic, etc) are losing money.
Just as was foretold: an actual differentiator is creativity, not coding ability.