Skip to main content

Forums » Smalltalk » The Cruelest Thing my Mind Has Concocted

Hi, I'm Joot, and I love Vocaloid. Vocaloid has produced us many amazing things, such as Hatsune Miku, Nyan Cat, most popular songs on Tiktok, and my insane autistic obsession of it.

However, I am one of the fans who gets really really into it, mostly on a different yet similar engine, UTAU (the one used in the afformentioned Nyan Cat). In utau, there are several methods of recording and stringing together the different phonemes required to synthesize the speech.

The oldest, and the one all others are adapted from, is CV, also known as Single Sound. This is very minimal, very quick, very easy to string, and only takes about 15 minutes to make (albeit longer if you want to record extra pitches to extend the range).

After that came VCV, another japanese format which records the lines in strings of syllables to allow every different vowel sound to flow very naturally and smoothly into the next sound. This quickly became the new standard and is very high quality and smooth, but it is a lot heavier in file size, much harder to use, and takes a lot longer to record at almost an hour per pitch on average.

With VCV being such a beast, people were looking for something smoother than CV, but not as heavy as VCV. This resulted in the creation of CVVC, which could be adapted to most languages and was smoother on average than CV, but with more room for error than VCV, a lot harder to use and make than VCV, and generally less popular than either (although it's the standard for languages like English due to how our phonemes work). Anyone who genuinely likes using CVVC is either insane or lying to you.

These three methods quickly lived in harmony as "The Only UTAU Recording Methods People Care About and Use™". More exist of course, but they haven't caught on like these three have. However, I felt like I'd throw my hat into the ring as well.

See, I am an UTAU power user. My favorite voice for it has 14 pitches of VCV and takes up a gigabyte of hard drive space. I like big, beefy voicebanks that sound like a human singing in my ears as smoothly as possible. And so, I thought I'd offer a voicebank recording method that appeals to that demographic: CVCV Japanese, also known as Jootystyle Mindf*ck.
It's VCV, but instead of being stringed from every vowel sound, it's stringed from every consonant vowel sound. This means that it has a dedicated transition for every single possible Japanese character. Now, how long do you think that would take to record? Two hours? Three?
No.
A single pitch of this recording format takes around 24 hours.
Is this overkill when VCV is more than smooth and natural enough? Yes, definitely. Will this ever be made? Probably not. But hey, the recording list exists and is out there. And now you're forced to know about it.

You are on: Forums » Smalltalk » The Cruelest Thing my Mind Has Concocted

Moderators: Sanne, Claine, Cass, Keke, Mina