Alignment with CTC ASR models powered by k2.#2772
Conversation
|
Here are some examples about how to use the code for alignments. You can also find these examples in the comment/documentation in the code. >>> import torch
>>> from speechbrain.pretrained import EncoderASR
>>> from speechbrain.k2_integration.align import CTCAligner
>>> asr_model = EncoderASR.from_hparams(source="speechbrain/asr-wav2vec2-librispeech", savedir="pretrained_models/asr-wav2vec2-librispeech")
>>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
>>> aligner = CTCAligner(model=asr_model, tokenizer=asr_model.tokenizer, device=device)
>>> audio_files = ["samples/audio_samples/example1.wav", "samples/audio_samples/example2.wav"]
>>> transcripts = ["HELLO WORLD", "THIS IS SPEECHBRAIN"]
>>> # align one audio file to tokens
>>> alignment = aligner.align_audio_to_tokens(audio_files[0], transcripts[0])
>>> # align one audio file to words
>>> alignment = aligner.align_audio_to_words(audio_files[0], transcripts[0], frame_shift=0.02)
>>> # align a batch of audio files to tokens
>>> alignments = aligner.align_batch_to_tokens(audio_files, transcripts)
>>> # align a batch of audio files to words
>>> alignments = aligner.align_batch_to_words(audio_files, transcripts, frame_shift=0.02)
>>> # align a csv file to tokens
>>> aligner.align_csv_to_tokens("samples/audio_samples/example.csv", "samples/audio_samples/example_token_alignment.txt")
>>> # align a csv file to words
>>> aligner.align_csv_to_words("samples/audio_samples/example.csv", "samples/audio_samples/example_word_alignment.csv", frame_shift=0.02) |
|
Here are some examples of input and output files. Input: dev-clean.csv (from LibriSpeech) Outputs: dev-clean.token.ali.txt Outputs: dev-clean.word.ali.csv |
|
Thank you @ZhaoZeyu1995! Could you please address the failing tests? |
|
No problem. I will deal with them soon. |
|
Hello @mravanelli , I just solved the pre-commit issue. |
|
@pplantinga I see this as a sort of integration. Should this be relocated in your integration folder? |
Yes, as soon as the Integrations PR is merged this will have to be moved there. |
|
Unfortunately I'm having trouble installing k2 and can't test this right now |
|
#2782 is now merged, could you move these files into the |
pplantinga
left a comment
There was a problem hiding this comment.
Hey @ZhaoZeyu1995 thank you for your contribution. This looks like a nice addition and it will be great to have k2 alignment in the toolkit. Everything looks put together in a sensible way and there's plenty of in-code documentation which is great. Actually having a tutorial of how to use this might be pretty nice too, but that can be put in a separate PR if needed.
I have some minor comments below, but I haven't actually been able to test this as I'm having trouble installing k2 on my system. I will ask if there's someone on the team who can test this.
pplantinga
left a comment
There was a problem hiding this comment.
I was able to confirm that this works. I have updated the branch and it looks read to merge for me
…els based on k2.
What does this PR do?
Support alignment with CTC ASR models powered by k2, a Differentiable Weighted Finite-State Transducer toolkit.
Both token- and word-level alignments are supported.
Currently, the training and decoding with k2 for CTC ASR are supported but not alignment.
Dependencies
Fixes #<issue_number>
Before submitting
PR review
Reviewer checklist