SpeechLLM (with LLaMA) and Conformer recipe for speech translation on CoVoST (Code from Samsung AI Center Cambridge) by TParcollet · Pull Request #2865 · speechbrain/speechbrain

TParcollet · 2025-03-17T16:52:19Z

This PR introduces two speech translation recipes: one training a conformer encoder decoder with MT training (ASR+ST) from scratch. The other finetuning an XLS-R into an adapter before a LLama 3 7B model. I'd need some help to test these ones. These two recipes are basic things that we can find on the literature, nothing new, but should serve a good entry points for baselines of papers.

This is based on the new LLaMA interface of #2850

I only tried for En - De.

TParcollet · 2025-03-17T20:21:55Z

I could use a first quick review on looking briefly at the code (forget about llama.py as it's from another PR). Maybe @Adel-Moumen / @pplantinga ?

@poonehmousavi this is an example as how to refactor multiwoz.

Make tests pass

Adel-Moumen

First round (more details will come later when I will have the chance to run the recipe)

How is the LLM processing Speech embeddings ? I see in one doctstring that the LLM takes [speech embds] + [prompt] + [translation]. I was wondering but in the vLLM literature, its more common to do <image_start> <image embeddings> <image_end> + [prompt] + [translation] and I belive this might be easier for the LLM to reason about what is an image thanks to the <image> token i.e. you can ask him question and relate to the <image> token. My question is more, how easy it is for us to modify the way the model is processing the input? Can we modify it easily to accommodate this? (I understand the code and how you concat embds, but I am wondering if we could maybe think of ways to make this more beautiful and avoid having to copy and paste the recipe elsewhere, maybe a function combine_embeddings could be useful idk).,

Also, I tend to think that some things could be improved. I understnd the point of exposing the forward pass to make it easier to understand, but maybe we want here to hide some ugly tricks. For instance, I tend to think we could just have a custom generate function in our LLM interface + a custom GenerationCofing. We also need to have a pointer to the un-DDP wrapped modules to avoid having to check if module key is present etc.

Adel-Moumen · 2025-04-08T13:29:13Z

+        if hasattr(self.modules.lora_llm, "module"):
+            gen_func = self.modules.lora_llm.module.model.generate
+        else:
+            gen_func = self.modules.lora_llm.model.generate


same as before this is so ugly xD

Yes, this is absurdly ugly. I have strictly no idea how we can fix it despite spending a significant amount of time on the problem. We would have to rethink entirely our DDP structure and even then, not sure that we can do something about that.

Adel-Moumen · 2025-08-02T11:31:17Z

FYI, currently working on reviewing / making some edits. Then, will run tests and merge.

Adel-Moumen · 2025-08-04T15:35:50Z

Added a recipe test for Conformer in the AST recipe. Couldn't do it for LLaMA since it's relying on a large pretrained LLM.

Gonna fix the pre-commit (couldn't run them on CCA).

Adel-Moumen

LGTM!

We still have minor edits to do such as the generation part, how we construct the prompt etc but I think we need to move on on that topic. Will try to get back to this one and refactor a bit.

Thanks Sensei.

TParcollet added 5 commits March 4, 2025 17:07

revamp llama

951141c

Update llama.py

d616ccb

remove llama from doctesting due to model access

559d1da

fix torch loading type

d6c80c4

covost prep

6446d4a

TParcollet added the work in progress Not ready for merge label Mar 17, 2025

TParcollet added 4 commits March 17, 2025 17:03

conformer recipe

b6e10ea

augmentation

865e5d5

add Llama recipe

83caf5a

remove augment

515b895

TParcollet added 8 commits March 17, 2025 20:33

add results

4a7d145

update results

e9b7bed

add sacrebleu

44e8aba

Update llama.py

d7228ef

Make tests pass

Merge branch 'develop' into ast_covost

a561671

precommit

100ca38

Merge branch 'develop' into ast_covost

315b307

better setup

0f2fe5d

Adel-Moumen self-requested a review April 8, 2025 13:00

Adel-Moumen requested changes Apr 8, 2025

View reviewed changes

Adel-Moumen reviewed Apr 10, 2025

View reviewed changes

Comment thread recipes/CoVoST/AST/hparams/w2v2_llama3.yaml

Adel-Moumen reviewed Apr 10, 2025

View reviewed changes

Comment thread recipes/CoVoST/AST/hparams/w2v2_llama3.yaml

Adel-Moumen reviewed Apr 10, 2025

View reviewed changes

Comment thread recipes/CoVoST/AST/hparams/w2v2_llama3.yaml Outdated

TParcollet added 5 commits July 15, 2025 15:36

Merge branch 'develop' into ast_covost

24cd464

fix comments

a675c77

remove layer norm

462fa0f

dev

e4ef431

make it work

f09dbf8

TParcollet removed the work in progress Not ready for merge label Jul 15, 2025

TParcollet added the ready to review Waiting on reviewer to provide feedback label Jul 15, 2025

Adel-Moumen added 5 commits August 4, 2025 10:45

editorial changes

815d4fd

add skip prep + doc formatting

d31265c

add sklip_prep + doc formatting

d3d0172

raise error instead of print

4cedc6a

recipe test for conformer recipe

f7fd91d

Adel-Moumen and others added 10 commits August 4, 2025 17:36

precommit

7ae6b35

Merge remote-tracking branch 'origin/develop' into ast_covost

292c6ac

remove unused args

2b5ad4f

add llama in test csv

e85e9a9

clean up yaml

d3e94c2

F

0e45f01

skip llama FU

b908912

add llama to skip in .run-load

83b0c30

add debug clis command

70348c8

Merge branch 'develop' into ast_covost

7b4cc8b

Adel-Moumen approved these changes Aug 4, 2025

View reviewed changes

Adel-Moumen merged commit 96b00ca into speechbrain:develop Aug 4, 2025
5 checks passed

Uh oh!

Conversation

TParcollet commented Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TParcollet commented Mar 17, 2025

Uh oh!

Adel-Moumen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Adel-Moumen Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

TParcollet Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Adel-Moumen commented Aug 2, 2025

Uh oh!

Adel-Moumen commented Aug 4, 2025

Uh oh!

Adel-Moumen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TParcollet commented Mar 17, 2025 •

edited

Loading

Adel-Moumen left a comment •

edited

Loading