Skip to content

Add MRR #46

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Apr 21, 2025
Merged

Add MRR #46

merged 31 commits into from
Apr 21, 2025

Conversation

abheesht17
Copy link
Collaborator

@abheesht17 abheesht17 commented Apr 9, 2025

TODO:

  • Add UTs.
  • Add documentation.
  • Verify doc-strings.
  • Some small TODOs in the code, which can be decided after code review.

@abheesht17 abheesht17 closed this Apr 9, 2025
@abheesht17 abheesht17 reopened this Apr 9, 2025
@abheesht17 abheesht17 changed the title Add MRR Add MRR, MAP, DCG, nDCG Apr 10, 2025
@abheesht17 abheesht17 marked this pull request as ready for review April 11, 2025 06:56
@abheesht17 abheesht17 requested a review from hertschuh April 11, 2025 06:56
@abheesht17
Copy link
Collaborator Author

MRR ready for review

Copy link
Collaborator

@hertschuh hertschuh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Partial review... to be continued

Returns:
List of sorted tensors (`tensors_to_sort`), sorted using `scores`.
"""
# TODO: Consider exposing `shuffle_ties` to the user.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it's done, remove TODO for shuffle_ties

shape `(list_size)` or `(batch_size, list_size)`. Defaults to
`None`.
"""
# TODO (abheesht): Should `y_true` be a dict, with `"mask"` as one key
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean as an option? Right now, there's no way to pass a mask, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, we use sample_weights > 0 to construct the mask.

But for pairwise losses, that doesn't work, and we need to pass mask explicitly. So, should we do the same here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, yes, if it's not too much work.

if isinstance(y_pred, list):
y_pred = ops.convert_to_tensor(y_pred)
# `sample_weight` can be a scalar too.
if isinstance(sample_weight, (list, float, int)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove all 3 if isinstance(...) and just call ops.convert_to_tensor, it does the if for you.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

elif sample_weight_rank == 2:
check_shapes_compatible(sample_weight_shape, y_true_shape)

# Want to make sure `sample_weight` is of the same shape as
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning you should add a check here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, the multiplication below takes care of that. I'll amend the comment

Copy link
Collaborator

@hertschuh hertschuh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this big PR. A lot of work obviously went into this!

More comments...



def default_rank_discount_fn(rank: types.Tensor) -> types.Tensor:
return ops.divide(ops.log(2.0), ops.log1p(rank))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring above says it's equivalent to lambda rank: log2(rank + 1).

So that would be reversing the arguments of the divide. Also, it would be clearer if written as ops.log2(rank + 1.0).



@keras_rs_export("keras_rs.metrics.nDCG")
class nDCG(RankingMetric):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention is to always use an upper case letter for the first letter of a class, so it should be NDCG, even if it's written as nDCG in papers.

Also the file should be ndcg.py.

I don't actually understand why it's written nDCG since all the letters form an acronym anyway.

@abheesht17 abheesht17 changed the title Add MRR, MAP, DCG, nDCG Add MRR Apr 17, 2025
)

final_string = "".join(processed_output).strip()
final_string = " " * 4 + final_string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace with base_indent_str.

from typing import Any


def format_docstring(template: str, width: int = 80, **kwargs: Any) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc-string has "in-line" variables like "concept_sentence". When I do a .format(), the line goes overboard. When I have a newline in concept_sentence, the next line gets messed up

@abheesht17 abheesht17 mentioned this pull request Apr 21, 2025
@hertschuh hertschuh merged commit bee1477 into keras-team:main Apr 21, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants