-
Notifications
You must be signed in to change notification settings - Fork 5
Add MRR #46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MRR #46
Conversation
MRR ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Partial review... to be continued
Returns: | ||
List of sorted tensors (`tensors_to_sort`), sorted using `scores`. | ||
""" | ||
# TODO: Consider exposing `shuffle_ties` to the user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it's done, remove TODO for shuffle_ties
shape `(list_size)` or `(batch_size, list_size)`. Defaults to | ||
`None`. | ||
""" | ||
# TODO (abheesht): Should `y_true` be a dict, with `"mask"` as one key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean as an option? Right now, there's no way to pass a mask, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, we use sample_weights > 0
to construct the mask.
But for pairwise losses, that doesn't work, and we need to pass mask explicitly. So, should we do the same here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, yes, if it's not too much work.
if isinstance(y_pred, list): | ||
y_pred = ops.convert_to_tensor(y_pred) | ||
# `sample_weight` can be a scalar too. | ||
if isinstance(sample_weight, (list, float, int)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove all 3 if isinstance(...)
and just call ops.convert_to_tensor
, it does the if for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
elif sample_weight_rank == 2: | ||
check_shapes_compatible(sample_weight_shape, y_true_shape) | ||
|
||
# Want to make sure `sample_weight` is of the same shape as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meaning you should add a check here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, the multiplication below takes care of that. I'll amend the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this big PR. A lot of work obviously went into this!
More comments...
|
||
|
||
def default_rank_discount_fn(rank: types.Tensor) -> types.Tensor: | ||
return ops.divide(ops.log(2.0), ops.log1p(rank)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring above says it's equivalent to lambda rank: log2(rank + 1)
.
So that would be reversing the arguments of the divide. Also, it would be clearer if written as ops.log2(rank + 1.0)
.
keras_rs/src/metrics/n_dcg.py
Outdated
|
||
|
||
@keras_rs_export("keras_rs.metrics.nDCG") | ||
class nDCG(RankingMetric): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convention is to always use an upper case letter for the first letter of a class, so it should be NDCG
, even if it's written as nDCG
in papers.
Also the file should be ndcg.py
.
I don't actually understand why it's written nDCG
since all the letters form an acronym anyway.
) | ||
|
||
final_string = "".join(processed_output).strip() | ||
final_string = " " * 4 + final_string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace with base_indent_str
.
from typing import Any | ||
|
||
|
||
def format_docstring(template: str, width: int = 80, **kwargs: Any) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc-string has "in-line" variables like "concept_sentence". When I do a .format(), the line goes overboard. When I have a newline in concept_sentence, the next line gets messed up
TODO: