Whisper finetunng common voice#1809
Conversation
There was a problem hiding this comment.
Hello,
Many thanks for this PR! You did a really great job. :-)
Could you please remove the file environment.yaml ? I don't see any reasons to keep it.
Please see all the issues mentioned in the review.
Concerning the normalized_transcripts=True have you trained your models with it ? If so, it might be worth checking if it did impact your results... also, are you going to release the models on the Gdrive/HF?
Please fix the pre-commit.
Thanks again for your impressive works! :-)
Adel
|
All changed are applied and pre-commit is tested |
Adel-Moumen
left a comment
There was a problem hiding this comment.
Really neats what you are doing! Thanks again!
Please look at my comments and could you please take a look at the pre-commit fails? Read the following tutorial https://speechbrain.readthedocs.io/en/latest/contributing.html so that you know how to solve the issues.
Thanks!
|
@Adel-Moumen pointed me to your current error log. they look identical, yet, the error is for if not (os.path.exists(file.strip())):
print(
"\tERROR: The file %s listed in %s does not exist!"
% (file, recipe_csvfile)
)which suggests the file isn't existing—yet, paths etc seem to match. You can try this offline when you made a change with note: this will consider also files you have not versioned by git but which are in your repo folders. I don't think my post is particularly helpful, other than you don't need to push that much here. I'm looking into it, if I can find sth more helpful... |
| ASR,CommonVoice,recipes/CommonVoice/ASR/transformer/train_with_whisper.py,recipes/CommonVoice/ASR/transformer/hparams/train_sr_hf_whisper.yaml,recipes/CommonVoice/ASR/transformer/common_voice_prepare.py,recipes/CommonVoice/ASR/transformer/README.md,https://drive.google.com/drive/folders/11NMzY0zV-NqJmPMyZfC3RtT64bYe-G_O?usp=sharing,,--data_folder=tests/samples/ASR/ --train_csv=tests/samples/annotation/ASR_train.csv --valid_csv=tests/samples/annotation/ASR_train.csv --test_csv=tests/samples/annotation/ASR_train.csv --number_of_epochs=1 --skip_prep=True, | ||
| ASR,CommonVoice,recipes/CommonVoice/ASR/transformer/train_with_whisper.py,recipes/CommonVoice/ASR/transformer/hparams/train_mn_hf_whisper.yaml,recipes/CommonVoice/ASR/transformer/common_voice_prepare.py,recipes/CommonVoice/ASR/transformer/README.md,https://drive.google.com/drive/folders/11NMzY0zV-NqJmPMyZfC3RtT64bYe-G_O?usp=sharing,,--data_folder=tests/samples/ASR/ --train_csv=tests/samples/annotation/ASR_train.csv --valid_csv=tests/samples/annotation/ASR_train.csv --test_csv=tests/samples/annotation/ASR_train.csv --number_of_epochs=1 --skip_prep=True, | ||
| ASR,CommonVoice,recipes/CommonVoice/ASR/transformer/train_with_whisper.py,recipes/CommonVoice/ASR/transformer/hparams/train_hi_hf_whisper.yaml,recipes/CommonVoice/ASR/transformer/common_voice_prepare.py,recipes/CommonVoice/ASR/transformer/README.md,https://drive.google.com/drive/folders/11NMzY0zV-NqJmPMyZfC3RtT64bYe-G_O?usp=sharing,,--data_folder=tests/samples/ASR/ --train_csv=tests/samples/annotation/ASR_train.csv --valid_csv=tests/samples/annotation/ASR_train.csv --test_csv=tests/samples/annotation/ASR_train.csv --number_of_epochs=1 --skip_prep=True, | ||
| SSL,CommonVoice,recipes/CommonVoice/self-supervised-learning/wav2vec2/train_hf_wav2vec2.py,recipes/CommonVoice/self-supervised-learning/wav2vec2/hparams/wav2vec2_base.yaml,recipes/CommonVoice/self-supervised-learning/wav2vec2/common_voice_prepare.py,recipes/CommonVoice/self-supervised-learning/wav2vec2/README.md,,,--data_folder=tests/samples/ASR/ --train_csv=tests/samples/annotation/ASR_train.csv --valid_csv=tests/samples/annotation/ASR_train.csv --test_csv=tests/samples/annotation/ASR_train.csv --number_of_epochs=2 --skip_prep=True --d_model=128 --wav2vec2_folder=tests/tmp/wav2vec2_checkpoint, |
There was a problem hiding this comment.
@poonehmousavi this looks good btw, the line of concern did not change ... interesting
There was a problem hiding this comment.
Yes, that is what makes it more confusing.
There was a problem hiding this comment.
wondering if the github workflow got stuck in an odd state
There was a problem hiding this comment.
Additionally, I did try pytest on my local repo and I didn't got that error and I didn't have any uncommitted change related to that files.
There was a problem hiding this comment.
it's the workflow then...
There was a problem hiding this comment.
could reproduce the error on my end
There was a problem hiding this comment.
ll recipes/CommonVoice/self-supervised-learning/wav2vec2/common_voice_prepare.py
recipes/CommonVoice/self-supervised-learning/wav2vec2/common_voice_prepare.py -> '../../common_voice_prepare.py'$'\n'this is an invalid symlink
…oonehmousavi/speechbrain into whisper-finetunng-common-voice
|
LGTM! Many thanks for your great work. It has been a pleasure to review your PR. |
Add whisper finetuning recepies for Common-voice data for following languages
Note: When using whisper large model, to improve memory usage during model recovery . you could use (see Avoid loading checkpoint parameters on the target device #1743)