Комментарии:
Why it's useless for audio books: Some things need to be pitched and some things can have multiple interpretations. For instance, "Your majesty" comes out as a squeak. Certainly that could work in some situations, but not all. In conversation, some voices need their pitch changed to denote gender and emotional state. The thing needs meta tags to indicate this information. Pauses are inconsistent and there's no way to adjust this. Metatags would be an easy implementation here. And there will certainly be no pregnant silences.
ОтветитьCan you make a guide how to turn a male voice into an attractive female voice that also does a lot of purring, breathing and meowing while reading the text and also ideally it does ASMR sounds like scratching and tapping and smacking / chewing.
ОтветитьWhat's the latest and greatest for real time voice changers? Is it still w-okada?
ОтветитьI trained the program on 20 hours of Arabic audio - and it took me a whole day of training - and the result was that the program is very weak in pronouncing words and does not pronounce them well - I think it needs a large number of epochs - because I put only 20 epochs - with an 8 GB graphics card - I do not know if there is another way to make it successful or not
ОтветитьAre there any Language Models i can train on AMD? Got a RX6800 but all tutorials seem to require a Nvidia GPU... i have 16 gigs of vram bruh come on :(
ОтветитьIs the quota of 30 hours per week on kaggle enough to train a 25-hour dataset on 2 t4 video cards?
Ответитьcan you add to the f5 fork the ability to save the result of a workout on google drive or yandex drive? The daily quota for kaggle is 12 hours, I need a way to resume training. Will you add this to the fork?
ОтветитьHow did you solve the memory (not GPU VRAM) problem? I have faced the same issue.
Ответитьi want
tts n stt in one api
free, no keys n
offline
Hey Jarod! Is there a way to use F5 TTS in an audio-to-audio type inference?
ОтветитьDoes your dataset need audio from only 1 speaker? Or can I use a 100 hours dataset with 500 speakers, male, female, young and old?
ОтветитьWill it work on the Intel arc a750?
ОтветитьHey. Nice tutorial, but could you please also make some videos on how to tune pretrained model to make it more accurate and how to implement new model to interface for further use? Thanks in advance.
Ответитьanyone tried to train with 8 or 12 gb gpu?
ОтветитьHi. Would this be able to train a model from scratch instead of finetuning the model bases given? Thanks.
ОтветитьUnfortunately, it does not support Indonesian language.
ОтветитьDoes this mean I can translate what a person says into another language?
ОтветитьDude can you please give me that Japanese trained model for free, Please.
ОтветитьWhy u making video in english with japanese audio samples....It has no sense as we dont understand the full process. You could make it in japanese and in english separately....
ОтветитьGood video.
ОтветитьHow to fix this? OSError: [WinError 126] The specified module could not be found. Error loading "D:\F5 TTS\F5-TTS\venv\Lib\site-packages\torch\lib\fbgemm.dll
ОтветитьMy question is, I want to fine tune any cloned voice to be used as tts with piper and home assistant, I don´t need to fine tune a full language, is the same process than the one explained here or can´t be done because the models can´t be exported to be used within piper ? Thanks
ОтветитьCan you tell me when I have finished training all the models, I can exit this f5tts program. I will turn off the computer and the next day I want to enter f5tts, what should I do?
ОтветитьCAN YOU TELL ME. AFTER USING IT, F5 TTS SHUT IT DOWN. THE NEXT DAY I WANT TO OPEN IT, WHAT SHOULD I DO? WHEN I OPEN IT, I RUN AGAIN FROM THE BEGINNING AND IT LOST MY PREVIOUS TRAINING DATA
ОтветитьError: No audio files found in the specified path : D:\F5-TTS\src\f5_tts\..\..\data\vn_thairadio_pinyin\wavs. i am having this error can you help me
Ответитьcan you make video how to add new language, cuz at hugging face already many finetune model from another language
ОтветитьSorry if this is a noob question, I am new to all of this, but why does my gradio interface look different than yours? Mine is black and orange.
ОтветитьIt won't transcribe for me. I get "transcribe complete samples : 0. error files : 148" with a 3-sec wav file in the wavs folder, and an empty metadata.csv. Reinstalled from scratch, including ffmpeg and pathing it. Same thing. If anyone knows what this is, please let me know.
ОтветитьI want you assisstance please. I am actually Trying to build my own AI ASSISTANT and for that i am using Edge-TTS module to get real time TextToSpeech Output. but I came across the f5-tts. I want to know that After successfully training the model with a custom voice. How can I deploy and program it into my AI AISSTANT so that it can speak basic prompts like "I am doing this task sir" , "Consider this task done, sir" with the Trained model from F5-tts in real time . I want to know how to DEPLOY>
ОтветитьAny TTS that can work with spanish - latam language?
Ответитьwhat should the minimum requirements to use this? I'm on a laptop.
ОтветитьI want to ask what is the requirement for these application to work ?
ОтветитьNice. Can a local installation be integrated with Python to synthesize text into audio? For instance, in Python, I want to provide a sample voice and raw text, and the text would be synthesized using the sample voice, then saved as an MP3 file.
ОтветитьDo you have any suggestions for finetuning parameters/solutions when the finetune is producing "beep" noises? I am getting banding of frequency close to 0 when generating audio from the finetune and not sure what could be causing this.
ОтветитьWhat I like its that the audio can be generated from python, will give it a try since my other alternative is not working great on a mac, lol
ОтветитьCan it change a recorded voice from a man to a woman?
ОтветитьCan this work on an rx 6600? If not what is the equivalent?
Ответитьcan you fine tune with 2 -3 minutes audio data? I just want to train for one specific voice and will use reference audio from fine-tuning dataset.
ОтветитьI don't know what to say, but my device's specs are four or five times lower than yours. To train it on two hours of recordings with 300 epochs, it took me 40 hours. If I had trained it on 10 hours, it might have taken me a month.
ОтветитьThe most impressive thing is just how well it does accents, notably rhe australian accent. Every other voice cloner ive tried just ends up sounding american. This nails every accent.
ОтветитьSo can I fine-tune an existing model to better sound like one person, like fune-tune the default F5-TTS model by training it further with a specific speakers data?
Or can I only create a new model from scratch like you did with that particular speakers data?
Any suggestion for this error ? Thanks in advance.
size mismatch for transformer.text_embed.text_embed.weight: copying a param with shape torch.Size([2546, 512]) from checkpoint, the shape in current model is torch.Size([2546, 100]).
size mismatch for transformer.input_embed.proj.weight: copying a param with shape torch.Size([1024, 712]) from checkpoint, the shape in current model is torch.Size([1024, 300]).
Keyboard interruption in main thread... closing server.
Firstly, thank you very much, great tutorial, magistral teacher!!! Everything works as you teach. Now, is there a tutorial showing how use it to generate long texts? Great new year for all!!!
ОтветитьHello, you mean you use 10 hours and 180k without pretrain model? At first you said 7000 hours.
Ответитьhey Jarod. I realize each reference voice clip cannot be more than 15 secs. However, how many 15 sec audio voice clips of the same voice can be uploaded? And does having multiple reference voice clips improve the quality of the final synthesized voice that's generated ?
Ответитьplease fucking god stop saying so at the start of every sentence
ОтветитьHey, after installing and running the inference using the given URL, I am wondering how do I launch the inference once again after closing the cmd and inference web browser tab. There is no .bat file that I can see :(
Ответить