Skip to content

Proposed changes for test_metrics.py #1577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 10, 2022

Conversation

shantam-8
Copy link
Contributor

The following PR (part of #1351) proposes the following changes for _PredictScorer, _ProbaScorer, and _TresholdScorer as part of the test_metrics.py. Using the guidance present, I have used pytests to make the tests as flexible as possible. Please let me know if I need to add/delete anything else and I would be glad to do the same for the rest of test_metrics.py. Thanks a lot!

Copy link
Contributor

@eddiebergman eddiebergman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, seem really good, thank you :)

black does elongate everything quite a bit which is a bit annoying. There are workarounds, i.e.

(
            np.array([[1.0, 0.0], [1.0, 0.0], [0.0, 1.0], [0.0, 1.0]]),
            "roc_auc",
            sklearn.metrics.roc_auc_score,
            *(1, 0, -1, -1.0),
        ),

Seeing all these tests like this, it seems like we require a lot of redundant info:

  • The optimum, worst_possible_result and sign can all be captured by something like bounds = (optimum, worst_possible_result), where you can infer sign from the ordering of whether optimum > worst_possible_result
  • Looking at how most of this test setup seems to be just reconstructing the metrics that already exist, we could just use the ones we already have defined. It should make the parametrizes a good bit shorter and remove the make_scorer calls everywhere and make the tests much more readable!

The first part would be for another PR and if you want to work on make that simpler, it would be a nice simple contribution too. The second part would be pretty nice to have for this PR :)

I'll run the tests now!

@codecov
Copy link

codecov bot commented Aug 31, 2022

Codecov Report

Merging #1577 (47907b0) into development (ee0916e) will decrease coverage by 0.26%.
The diff coverage is n/a.

❗ Current head 47907b0 differs from pull request most recent head aeb15dd. Consider uploading reports for the commit aeb15dd to get more accurate results

Additional details and impacted files
@@               Coverage Diff               @@
##           development    #1577      +/-   ##
===============================================
- Coverage        84.75%   84.49%   -0.27%     
===============================================
  Files              157      155       -2     
  Lines            11981    11898      -83     
  Branches          2066     2058       -8     
===============================================
- Hits             10155    10053     -102     
- Misses            1278     1281       +3     
- Partials           548      564      +16     

Impacted file tree graph

@eddiebergman eddiebergman marked this pull request as ready for review September 26, 2022 06:52
@eddiebergman
Copy link
Contributor

Hi @shantam-8,

It looks good to me now, sorry for the deleyad responses, we had some deadlines coming up for other papers that kept us away from auto-sklearn! I pinged @mfeurer to have a quick look

Best,
Eddie

Copy link
Contributor

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, that looks good. Could you maybe add a comment stating that we test the assembled scorers instead of assembling the scorer objects ourselves in the test?

Besides that, this looks great. Thanks a lot!

@shantam-8
Copy link
Contributor Author

Hello @eddiebergman @mfeurer, sorry to reply so late. I had made a few more updates to test_metrics.py, specifically the test_sign_flip function (formerly part of TestScorer class) and the test_regression_metrics and test_classification_metrics functions (formerly part of TestMetricsDoNotAlterInput class). Could you please let me know if these are fine or should I roll them back?
Also, I was going to approach the TestMetric class. Should I do it as another PR after merging this one as I was not very sure as to how to do it?

@eddiebergman
Copy link
Contributor

eddiebergman commented Sep 26, 2022

Hi @shantam-8,

I looked at the TestMetric class and I think you can honestly just delete it, given that the metrics all achieve the score they are meant to. It doesn't provide much more over the tests we already have and is just extra noise (imo). @mfeurer do you agree?

The other ones look fine to me.

Edit: I think mfeurer is busy for the nxt few days so deleting that class and if all the tests pass, I think this is ready to merge :)

@eddiebergman
Copy link
Contributor

Not sure why the metadata generation tests are failing to be honest. I'll try rerun them

@eddiebergman
Copy link
Contributor

Sorry to keep it going, I guess I didn't unfold the TestMetric class all the way on github. Can you make sure that every metric we export is tested at least once in the big long test, i.e. that there's one or two cases for each metric of sample input, the true y and the expected score.

@shantam-8
Copy link
Contributor Author

While the long test list does contain metrics like accuracy, bal_accuracy, r2, log_loss, and roc_auc, it misses a huge portion of metrics that are part of the autosklearn.metrics.REGRESSION_METRICS and autosklearn.metrics.CLASSIFICATION_METRICS like precision, recall, f1, mse, etcetera. However, as the TestMetricclass covered these, would it make sense to maintain it?

@eddiebergman
Copy link
Contributor

Sure, we can just keep them in instead of adding them to the huge list then :)

@shantam-8 shantam-8 requested a review from eddiebergman October 6, 2022 14:49
@eddiebergman eddiebergman merged commit f160d7d into automl:development Oct 10, 2022
@eddiebergman
Copy link
Contributor

Hey @shantam-8,

Very sorry for the slow feedback loop on all of this, we've been pretty busy recently! Many thanks for the contribution and we'll be more responsive for the coming months :)

Best,
Eddie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants