Alignment with CTC ASR models powered by k2. by ZhaoZeyu1995 · Pull Request #2772 · speechbrain/speechbrain

ZhaoZeyu1995 · 2024-11-26T03:20:19Z

…els based on k2.

What does this PR do?

Support alignment with CTC ASR models powered by k2, a Differentiable Weighted Finite-State Transducer toolkit.
Both token- and word-level alignments are supported.

Currently, the training and decoding with k2 for CTC ASR are supported but not alignment.

Dependencies

k2, for installation, please refer to the official installation guide

Fixes #<issue_number>

Before submitting

Did you read the contributor guideline?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified
Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
Review the self-review checklist to ensure the code is ready for review

…els based on k2.

ZhaoZeyu1995 · 2024-11-26T14:58:03Z

Here are some examples about how to use the code for alignments. You can also find these examples in the comment/documentation in the code.

    >>> import torch
    >>> from speechbrain.pretrained import EncoderASR
    >>> from speechbrain.k2_integration.align import CTCAligner
    >>> asr_model = EncoderASR.from_hparams(source="speechbrain/asr-wav2vec2-librispeech", savedir="pretrained_models/asr-wav2vec2-librispeech")
    >>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    >>> aligner = CTCAligner(model=asr_model, tokenizer=asr_model.tokenizer, device=device)
    >>> audio_files = ["samples/audio_samples/example1.wav", "samples/audio_samples/example2.wav"]
    >>> transcripts = ["HELLO WORLD", "THIS IS SPEECHBRAIN"]
    >>> # align one audio file to tokens
    >>> alignment = aligner.align_audio_to_tokens(audio_files[0], transcripts[0])
    >>> # align one audio file to words
    >>> alignment = aligner.align_audio_to_words(audio_files[0], transcripts[0], frame_shift=0.02)
    >>> # align a batch of audio files to tokens
    >>> alignments = aligner.align_batch_to_tokens(audio_files, transcripts)
    >>> # align a batch of audio files to words
    >>> alignments = aligner.align_batch_to_words(audio_files, transcripts, frame_shift=0.02)
    >>> # align a csv file to tokens
    >>> aligner.align_csv_to_tokens("samples/audio_samples/example.csv", "samples/audio_samples/example_token_alignment.txt")
    >>> # align a csv file to words
    >>> aligner.align_csv_to_words("samples/audio_samples/example.csv", "samples/audio_samples/example_word_alignment.csv", frame_shift=0.02)

ZhaoZeyu1995 · 2024-11-26T15:02:24Z

Here are some examples of input and output files.

Input: dev-clean.csv (from LibriSpeech)

dev-clean.csv

Outputs: dev-clean.token.ali.txt

dev-clean.token.ali.txt

Outputs: dev-clean.word.ali.csv

dev-clean.word.ali.csv

mravanelli · 2024-12-04T20:30:17Z

Thank you @ZhaoZeyu1995! Could you please address the failing tests?

ZhaoZeyu1995 · 2024-12-10T17:28:31Z

No problem. I will deal with them soon.

ZhaoZeyu1995 · 2024-12-23T19:07:18Z

Hello @mravanelli , I just solved the pre-commit issue.

TParcollet · 2025-01-15T09:37:11Z

@pplantinga I see this as a sort of integration. Should this be relocated in your integration folder?

pplantinga · 2025-05-21T17:35:17Z

@pplantinga I see this as a sort of integration. Should this be relocated in your integration folder?

Yes, as soon as the Integrations PR is merged this will have to be moved there.

pplantinga · 2025-05-21T17:35:43Z

Unfortunately I'm having trouble installing k2 and can't test this right now

pplantinga · 2025-05-27T14:56:54Z

#2782 is now merged, could you move these files into the integrations folder?

pplantinga

Hey @ZhaoZeyu1995 thank you for your contribution. This looks like a nice addition and it will be great to have k2 alignment in the toolkit. Everything looks put together in a sensible way and there's plenty of in-code documentation which is great. Actually having a tutorial of how to use this might be pretty nice too, but that can be put in a separate PR if needed.

I have some minor comments below, but I haven't actually been able to test this as I'm having trouble installing k2 on my system. I will ask if there's someone on the team who can test this.

pplantinga

I was able to confirm that this works. I have updated the branch and it looks read to merge for me

Add align.py in speechbrain/k2_integration fro alignment with CTC mod…

bc8703f

…els based on k2.

ZhaoZeyu1995 changed the title ~~Add align.py in speechbrain/k2_integration fro alignment with CTC ASR models.~~ Add align.py in speechbrain/k2_integration for alignment with CTC ASR models. Nov 26, 2024

ZhaoZeyu1995 changed the title ~~Add align.py in speechbrain/k2_integration for alignment with CTC ASR models.~~ Alignment with CTC ASR models powered by k2. Nov 26, 2024

ZhaoZeyu1995 marked this pull request as ready for review November 28, 2024 01:21

mravanelli added the enhancement New feature or request label Dec 4, 2024

mravanelli assigned ZhaoZeyu1995 Dec 4, 2024

Merge branch 'develop' into k2align

b8a6e9e

ZhaoZeyu1995 added 3 commits December 23, 2024 15:02

Merge branch 'speechbrain:develop' into k2align

70d6bff

Trying to solve the pre-commit problem.

742245d

Solved the pre-commit issue.

fe43a58

Merge branch 'develop' into k2align

b2ccfeb

Merge branch 'develop' into k2align

940e299

Merge branch 'develop' into k2align

c32aa39

Merge branch 'develop' into k2align

bdcaf24

pplantinga reviewed Jul 9, 2025

View reviewed changes

Comment thread speechbrain/k2_integration/align.py

Comment thread speechbrain/k2_integration/align.py

Comment thread speechbrain/k2_integration/align.py

Comment thread speechbrain/k2_integration/align.py

Move align file to integrations folder

1e10cc2

pplantinga reviewed Jul 9, 2025

View reviewed changes

Comment thread speechbrain/integrations/k2_fsa/align.py Outdated

pplantinga added 3 commits July 17, 2025 15:43

Add working example for doctest, add warning when alignment fails

a0ffe9c

Merge branch 'develop' into k2align

13a9668

Add tutorial for forced alignment

31ea9b8

pplantinga approved these changes Jul 17, 2025

View reviewed changes

Add tutorial to documentation pages

42a1c97

pplantinga merged commit 13cd75c into speechbrain:develop Jul 17, 2025
5 checks passed

Uh oh!

Conversation

ZhaoZeyu1995 commented Nov 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

PR review

Uh oh!

ZhaoZeyu1995 commented Nov 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZhaoZeyu1995 commented Nov 26, 2024

Uh oh!

mravanelli commented Dec 4, 2024

Uh oh!

ZhaoZeyu1995 commented Dec 10, 2024

Uh oh!

ZhaoZeyu1995 commented Dec 23, 2024

Uh oh!

TParcollet commented Jan 15, 2025

Uh oh!

pplantinga commented May 21, 2025

Uh oh!

pplantinga commented May 21, 2025

Uh oh!

pplantinga commented May 27, 2025

Uh oh!

pplantinga left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pplantinga left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ZhaoZeyu1995 commented Nov 26, 2024 •

edited

Loading

ZhaoZeyu1995 commented Nov 26, 2024 •

edited

Loading