Skip to content

Alignment with CTC ASR models powered by k2.#2772

Merged
pplantinga merged 14 commits into
speechbrain:developfrom
ZhaoZeyu1995:k2align
Jul 17, 2025
Merged

Alignment with CTC ASR models powered by k2.#2772
pplantinga merged 14 commits into
speechbrain:developfrom
ZhaoZeyu1995:k2align

Conversation

@ZhaoZeyu1995

@ZhaoZeyu1995 ZhaoZeyu1995 commented Nov 26, 2024

Copy link
Copy Markdown
Contributor

…els based on k2.

What does this PR do?

Support alignment with CTC ASR models powered by k2, a Differentiable Weighted Finite-State Transducer toolkit.
Both token- and word-level alignments are supported.

Currently, the training and decoding with k2 for CTC ASR are supported but not alignment.

Dependencies

Fixes #<issue_number>

Before submitting
  • Did you read the contributor guideline?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist
  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified
  • Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
  • Review the self-review checklist to ensure the code is ready for review

@ZhaoZeyu1995 ZhaoZeyu1995 changed the title Add align.py in speechbrain/k2_integration fro alignment with CTC ASR models. Add align.py in speechbrain/k2_integration for alignment with CTC ASR models. Nov 26, 2024
@ZhaoZeyu1995 ZhaoZeyu1995 changed the title Add align.py in speechbrain/k2_integration for alignment with CTC ASR models. Alignment with CTC ASR models powered by k2. Nov 26, 2024
@ZhaoZeyu1995

ZhaoZeyu1995 commented Nov 26, 2024

Copy link
Copy Markdown
Contributor Author

Here are some examples about how to use the code for alignments. You can also find these examples in the comment/documentation in the code.

    >>> import torch
    >>> from speechbrain.pretrained import EncoderASR
    >>> from speechbrain.k2_integration.align import CTCAligner
    >>> asr_model = EncoderASR.from_hparams(source="speechbrain/asr-wav2vec2-librispeech", savedir="pretrained_models/asr-wav2vec2-librispeech")
    >>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    >>> aligner = CTCAligner(model=asr_model, tokenizer=asr_model.tokenizer, device=device)
    >>> audio_files = ["samples/audio_samples/example1.wav", "samples/audio_samples/example2.wav"]
    >>> transcripts = ["HELLO WORLD", "THIS IS SPEECHBRAIN"]
    >>> # align one audio file to tokens
    >>> alignment = aligner.align_audio_to_tokens(audio_files[0], transcripts[0])
    >>> # align one audio file to words
    >>> alignment = aligner.align_audio_to_words(audio_files[0], transcripts[0], frame_shift=0.02)
    >>> # align a batch of audio files to tokens
    >>> alignments = aligner.align_batch_to_tokens(audio_files, transcripts)
    >>> # align a batch of audio files to words
    >>> alignments = aligner.align_batch_to_words(audio_files, transcripts, frame_shift=0.02)
    >>> # align a csv file to tokens
    >>> aligner.align_csv_to_tokens("samples/audio_samples/example.csv", "samples/audio_samples/example_token_alignment.txt")
    >>> # align a csv file to words
    >>> aligner.align_csv_to_words("samples/audio_samples/example.csv", "samples/audio_samples/example_word_alignment.csv", frame_shift=0.02)

@ZhaoZeyu1995

Copy link
Copy Markdown
Contributor Author

Here are some examples of input and output files.

Input: dev-clean.csv (from LibriSpeech)

dev-clean.csv

Outputs: dev-clean.token.ali.txt

dev-clean.token.ali.txt

Outputs: dev-clean.word.ali.csv

dev-clean.word.ali.csv

@ZhaoZeyu1995 ZhaoZeyu1995 marked this pull request as ready for review November 28, 2024 01:21
@mravanelli mravanelli added the enhancement New feature or request label Dec 4, 2024
@mravanelli

Copy link
Copy Markdown
Collaborator

Thank you @ZhaoZeyu1995! Could you please address the failing tests?

@ZhaoZeyu1995

Copy link
Copy Markdown
Contributor Author

No problem. I will deal with them soon.

@ZhaoZeyu1995

Copy link
Copy Markdown
Contributor Author

Hello @mravanelli , I just solved the pre-commit issue.

@TParcollet

Copy link
Copy Markdown
Collaborator

@pplantinga I see this as a sort of integration. Should this be relocated in your integration folder?

@pplantinga

Copy link
Copy Markdown
Collaborator

@pplantinga I see this as a sort of integration. Should this be relocated in your integration folder?

Yes, as soon as the Integrations PR is merged this will have to be moved there.

@pplantinga

Copy link
Copy Markdown
Collaborator

Unfortunately I'm having trouble installing k2 and can't test this right now

@pplantinga

Copy link
Copy Markdown
Collaborator

#2782 is now merged, could you move these files into the integrations folder?

@pplantinga pplantinga left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ZhaoZeyu1995 thank you for your contribution. This looks like a nice addition and it will be great to have k2 alignment in the toolkit. Everything looks put together in a sensible way and there's plenty of in-code documentation which is great. Actually having a tutorial of how to use this might be pretty nice too, but that can be put in a separate PR if needed.

I have some minor comments below, but I haven't actually been able to test this as I'm having trouble installing k2 on my system. I will ask if there's someone on the team who can test this.

Comment thread speechbrain/k2_integration/align.py
Comment thread speechbrain/k2_integration/align.py
Comment thread speechbrain/k2_integration/align.py
Comment thread speechbrain/k2_integration/align.py
Comment thread speechbrain/integrations/k2_fsa/align.py Outdated

@pplantinga pplantinga left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to confirm that this works. I have updated the branch and it looks read to merge for me

@pplantinga pplantinga merged commit 13cd75c into speechbrain:develop Jul 17, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants