Refactor README and Vicinity class to support any serializable item type#56
Merged
Conversation
- Updated README.md to clarify that items can be strings or other serializable objects. - Modified the Vicinity class to accept a broader range of item types by changing type hints from `str` to `Any` in several methods. - Enhanced the insert and delete methods to handle non-string tokens appropriately, ensuring that items can be checked and managed regardless of their type.
… and evaluating backends
Codecov Report✅ All modified and coverable lines are covered by tests.
🚀 New features to boost your workflow:
|
- Simplified the logic for checking and appending tokens in the insert method, ensuring that duplicate tokens are properly managed.
- Updated the `items` fixture to return a mix of dictionaries and strings based on index parity. - Modified `test_vicinity_insert_duplicate` to use the updated `items` fixture for inserting items. - Adjusted `test_vicinity_delete_and_query` to reference items by their indices instead of hardcoded values. - Enhanced the Vicinity class to streamline token management, ensuring proper handling of duplicates and improving error messaging for token deletions.
stephantul
approved these changes
Jan 20, 2025
stephantul
left a comment
Contributor
There was a problem hiding this comment.
This looks good, thanks! One improvement I see is that we can explicitly throw a ValueError on trying to save if the items are not JSON serializable. e.g., by catching a JsonEncodeError and giving a more informative warning.
Another improvement is the typing, but I'll do that in a follow-up because I actually don't know what the best way is 💀
Co-authored-by: Stephan Tulkens <stephantul@gmail.com>
…ling - Replaced the nested loop for checking duplicates with a single extend operation for tokens. - Improved efficiency by directly appending tokens to the items list, ensuring proper management of duplicates.
…ling - Replaced the nested loop for token matching with a more efficient list comprehension. - Enhanced error messaging to specify which tokens were not found in the vector space.
- Added a try-except block around the JSON serialization process to catch JSONEncodeError.
- Introduced a new pytest fixture `non_serializable_items` that generates a list of non-serializable objects for testing. - Added a test case `test_vicinity_save_and_load_non_serializable_items` to verify that saving a Vicinity instance with non-serializable items raises a JSONEncodeError. - Updated the Vicinity class documentation to specify that JSONEncodeError may be raised if items are not serializable.
stephantul
approved these changes
Jan 24, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I was playing around and realised the items don't necessarily need to be strings.
Update
strtoAnyin several methods.Question
Also, why do we use tokens as argument names for items in the insert and delete methods?
Example
Something like the following works now. Slower during insert and deletion but I would say the flexibility is worth it.