-
Notifications
You must be signed in to change notification settings - Fork 28.8k
Add inputs vector to calculate metric method #16461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The documentation is not available anymore as the PR was closed or merged. |
Unfortunately, we can't change the |
Ok, spent a bit of time on this and to enable an class EvalPrediction:
"""
Evaluation output (always contains labels), to be used to compute metrics.
Parameters:
predictions (`np.ndarray`): Predictions of the model.
label_ids (`np.ndarray`): Targets to be matched.
inputs (`np.ndarray`, *optional*): Inputs of the model.
"""
def __init__(
self,
predictions: Union[np.ndarray, Tuple[np.ndarray]],
label_ids: Union[np.ndarray, Tuple[np.ndarray]],
inputs: Optional[Union[np.ndarray, Tuple[np.ndarray]]] = None,
):
self.predictions = predictions
self.label_ids = label_ids
self.inputs = inputs
def __iter__(self):
if self.inputs is not None:
return iter((self.predictions, self.label_ids, self.inputs))
else:
return iter((self.predictions, self.label_ids))
def __getitem__(self, idx):
if idx < 0 or idx > 2:
raise IndexError("tuple index out of range")
if idx == 2 and self.inputs is None:
raise IndexError("tuple index out of range")
if idx == 0:
return self.predictions
elif idx ==1:
return self.label_ids
elif idx == 2:
return self.inputs Then we should add a flag so the user can choose whether or not they want the inputs included for metrics or not (default Can you include them in your PR? |
Hi @sgugger , Thanks for adding this change. I've tested this code with the summarization examples from pytorch and I'm afraid this change will break other people's code. The problem is that since the inputs are passed in the trainer by default (based on my PR): trainer.py
The inputs in run_summarization.py
You can see in the debugger that all 3 (inputs, preds and labels) are coming: In my case, I have my own implementation for compute_metrics, however, for other users it will fail by default. When you suggest the flag, where is it located? In the This is my testing case:
test.json
Thanks, Laura |
As I said above, this needs to be controlled by a flag (for instance |
Thanks for the clarification, that sounds better. I've submitted an additional commit in the PR. Is the first time I update a PR, let me know if I've missed something. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the flag. It should be in the TrainingArguments
class however, not the Trainer
class itself. Also, the inputs should only be gathered when the flag is set to True
, otherwise users that do not need it might get OOM errors.
src/transformers/trainer.py
Outdated
@@ -257,6 +257,9 @@ class Trainer: | |||
A function that preprocess the logits right before caching them at each evaluation step. Must take two | |||
tensors, the logits and the labels, and return the logits once processed as desired. The modifications made | |||
by this function will be reflected in the predictions received by `compute_metrics`. | |||
include_inputs_for_metrics: bool = None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow the same format as for other arguments :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps like this?
include_inputs_for_metrics (bool
, optional):
src/transformers/trainer.py
Outdated
@@ -257,6 +257,9 @@ class Trainer: | |||
A function that preprocess the logits right before caching them at each evaluation step. Must take two | |||
tensors, the logits and the labels, and return the logits once processed as desired. The modifications made | |||
by this function will be reflected in the predictions received by `compute_metrics`. | |||
include_inputs_for_metrics: bool = None: | |||
A flag (True|False) determining if inputs will be passed to the EvalPrediction class. This is intended for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A flag (True|False) determining if inputs will be passed to the EvalPrediction class. This is intended for | |
Whether or not the inputs will be passed to the `compute_metrics` function. This is intended for |
src/transformers/trainer.py
Outdated
@@ -293,6 +296,8 @@ def __init__( | |||
callbacks: Optional[List[TrainerCallback]] = None, | |||
optimizers: Tuple[torch.optim.Optimizer, torch.optim.lr_scheduler.LambdaLR] = (None, None), | |||
preprocess_logits_for_metrics: Callable[[torch.Tensor, torch.Tensor], torch.Tensor] = None, | |||
include_inputs_for_metrics: bool = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flag should be added in the TrainingArguments
class, not here.
I thought about the OOM issue, but should I add:
Everywhere in the code? I can't think of a more elegant way :) And for the new changes, is it ok just another commit in the pull request just like I did the last time? Just checking, so I don't do a mess :P |
src/transformers/trainer.py
Outdated
@@ -2452,6 +2465,7 @@ def evaluation_loop( | |||
|
|||
# Prediction step | |||
loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys) | |||
inputs_decode = inputs['input_ids'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set None
here if the flag is False
You should make sure that the inputs are left as None like the labels/losses by only setting some inputs when the flag is False. I've left a comment to show you where.
That's completely ok, we will squash everything when merging. |
Changes done, let me know any further feedback :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Left some last comments and we should be good to merge.
Also remember to run make style
on your branch once you're done to apply the code-formatting tools :-)
src/transformers/trainer.py
Outdated
if args.include_inputs_for_metrics: | ||
args.include_inputs_for_metrics = True | ||
else: | ||
args.include_inputs_for_metrics = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TrainingArguments
should not be touched by the Trainer
, I don't think this is necessary in any case.
if args.include_inputs_for_metrics: | |
args.include_inputs_for_metrics = True | |
else: | |
args.include_inputs_for_metrics = False |
src/transformers/trainer.py
Outdated
preprocess_logits_for_metrics: Callable[[torch.Tensor, torch.Tensor], torch.Tensor] = None | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need the comma back and no new line here.
src/transformers/trainer.py
Outdated
if args.include_inputs_for_metrics: | ||
metrics = self.compute_metrics(EvalPrediction(inputs=all_inputs, predictions=all_preds, | ||
label_ids=all_labels)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if args.include_inputs_for_metrics: | |
metrics = self.compute_metrics(EvalPrediction(inputs=all_inputs, predictions=all_preds, | |
label_ids=all_labels)) | |
if args.include_inputs_for_metrics: | |
metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, | |
label_ids=all_labels, inputs=all_inputs)) |
Needs to be last I think.
src/transformers/training_args.py
Outdated
@@ -417,6 +417,9 @@ class TrainingArguments: | |||
`huggingface-cli login`. | |||
gradient_checkpointing (`bool`, *optional*, defaults to `False`): | |||
If True, use gradient checkpointing to save memory at the expense of slower backward pass. | |||
include_inputs_for_metrics (bool, optional): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include_inputs_for_metrics (bool, optional): | |
include_inputs_for_metrics (`bool`, *optional*, defaults to `False`): |
I've added the requested changes. Also, I've run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for all your work on this! It's all good to me apart from the changes at the start of the trainer file.
src/transformers/trainer.py
Outdated
import numpy as np | ||
import torch | ||
from huggingface_hub import Repository | ||
from packaging import version | ||
from torch import nn | ||
from torch.utils.data import DataLoader, Dataset, RandomSampler, SequentialSampler | ||
from torch.utils.data.distributed import DistributedSampler | ||
from tqdm.auto import tqdm | ||
|
||
from . import __version__ | ||
from .configuration_utils import PretrainedConfig | ||
from .data.data_collator import DataCollator, DataCollatorWithPadding, default_data_collator | ||
from .debug_utils import DebugOption, DebugUnderflowOverflow | ||
from .deepspeed import deepspeed_init, deepspeed_reinit, is_deepspeed_zero3_enabled | ||
from .dependency_versions_check import dep_version_check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be changed by make style
. Are you sure you have the proper version of the formatting tools installed (pip install .[quality]
in the repo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm sorry. I tried to install manually some of the dependencies, but it was still failing some parts. I've done the changes to bring it back to its original state, adding your also the changes from your empty lines.
src/transformers/trainer.py
Outdated
@@ -181,13 +178,11 @@ | |||
|
|||
from .trainer_pt_utils import smp_forward_backward, smp_forward_only, smp_gather, smp_nested_concat | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to remove this line.
src/transformers/trainer.py
Outdated
if TYPE_CHECKING: | ||
import optuna | ||
|
||
logger = logging.get_logger(__name__) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
src/transformers/trainer.py
Outdated
@@ -2387,7 +2382,6 @@ def evaluation_loop( | |||
|
|||
# if eval is called w/o train init deepspeed here | |||
if args.deepspeed and not self.deepspeed: | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And same here
Thanks! There is one last issue with a docstring badly formatted. Could you run |
Sure, no problem. I'm having issues with the dependencies to run This is my log:
|
It looks like your branch is not up to par with the main branch for the setup. Can you manually install |
Now is working :) Thanks for that one. I've committed the updated file.
|
Yes, it's all good now. Thanks again for your contribution! |
Awesome! :) Thanks for leading this effort, good job :D |
What does this PR do?
This is a PR suggestion for including the inputs in the EvalPrediction object to perform metrics calculation that depends on inputs. For example, simplification metrics such as SARI not only use the predictions and references but also the inputs for the score calculation.
The proposed implementation will enable the Trainer to work with the metrics class. However, the compute_metrics method should be implemented locally (in the metrics file, for example), since the original method still receives predictions and references.
Supports #15966
Who can review?
@sgugger