Skip to content

Attempt to fix warning calls with Logger#2932

Closed
Adel-Moumen wants to merge 10 commits into
speechbrain:developfrom
Adel-Moumen:fix_warning_calls
Closed

Attempt to fix warning calls with Logger#2932
Adel-Moumen wants to merge 10 commits into
speechbrain:developfrom
Adel-Moumen:fix_warning_calls

Conversation

@Adel-Moumen

@Adel-Moumen Adel-Moumen commented May 31, 2025

Copy link
Copy Markdown
Collaborator

What does this PR do?

Attempt to fix warnings #2046 and #2925.

Unfortunately, all my attempts to fix this problem failed. Each worker seems to call the warning_once method and trigger the message. Even when we try to cache the message. My understanding is that each worker is created a new logger, which doesn't share the same cached message. I decided to fallback to logger.debug to avoid having this messages. And now, thinking about it, it might make more sense. We should only inspect the different between output tokens and target vocab size if we are seeing NaNs.

If we agree to merge this PR, we will need to re-run #2925 to remove the outputs.

Before submitting
  • Did you read the contributor guideline?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist
  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified
  • Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
  • Review the self-review checklist to ensure the code is ready for review

@Adel-Moumen Adel-Moumen marked this pull request as ready for review May 31, 2025 13:28
@Adel-Moumen Adel-Moumen requested a review from pplantinga May 31, 2025 13:28

@pplantinga pplantinga left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this using the speaker-id template, and the dataloader still seems to spit out multiple messages, and I wasn't able to figure out why.

Maybe its not worth fixing this bug, as it seems a bit complicated and maybe not worth it for the couple of errors we have in the dataloader?

)
from speechbrain.utils.logger import get_logger

warnings.simplefilter("once")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is something we should be changing globally, because this will change the warning behavior of everyone who imports speechbrain. With the lazy loading, it might not matter as much, but it would be a very weird and hard to debug error if someone's warning behavior changed from importing speechbrain and then further lazy loading the dataio encoder.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus, the warnings module is never used below so it doesn't even have an effect.

)
else:
logger.warning_once(
logger.debug(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This avoids spamming the console and pushes the spam to the log, but now doesn't print on the console at all, which I'm not sure we want. The underlying issue is still there, if I change this to "warning_once" again, then it still prints a bunch of times. It seems as though the multi-threading of the dataloader is creating new loggers each time.

This method is decorated with `functools.lru_cache(None)`, ensuring that the warning
message is logged only once regardless of how many times the method is called.
This method uses a set to track issued warnings, ensuring that the same warning is not logged multiple times.
This is a workaround for the fact that `functools.lru_cache(None)` is not thread-safe.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach still doesn't seem thread-safe, or at least when I tried it with the tutorial it still print multiple messages.

@Adel-Moumen Adel-Moumen closed this Jun 3, 2025
@Adel-Moumen

Copy link
Copy Markdown
Collaborator Author

I tried this using the speaker-id template, and the dataloader still seems to spit out multiple messages, and I wasn't able to figure out why.

Maybe its not worth fixing this bug, as it seems a bit complicated and maybe not worth it for the couple of errors we have in the dataloader?

Yeah. I don'tthink its worth it. I can try to revisit this PR latter on as I am bit surprised by how difficult it is to solve this problem haha.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants