-
Notifications
You must be signed in to change notification settings - Fork 235
Add application err category #1925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dd61c43
to
8700e2a
Compare
9522536
to
49536fa
Compare
49536fa
to
388c9ba
Compare
internal/error.go
Outdated
@@ -137,6 +140,7 @@ type ( | |||
// | |||
// NOTE: This option is supported by Temporal Server >= v1.24.2 older version will ignore this value. | |||
NextRetryDelay time.Duration | |||
Category ApplicationErrorCategory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs docs
test/integration_test.go
Outdated
"Benign failure", | ||
"", | ||
temporal.ApplicationErrorOptions{ | ||
Category: internal.ErrorCategoryBenign, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test should not refer to internal
types, you need to export ErrorCategoryBenign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exported/exposed in common error.
internal/error.go
Outdated
@@ -137,6 +140,7 @@ type ( | |||
// | |||
// NOTE: This option is supported by Temporal Server >= v1.24.2 older version will ignore this value. | |||
NextRetryDelay time.Duration | |||
Category ApplicationErrorCategory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the release status of temporalio/features#614? Is it GA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question - I'll ask drew
internal/error.go
Outdated
@@ -119,6 +119,9 @@ Workflow consumers will get an instance of *WorkflowExecutionError. This error w | |||
*/ | |||
|
|||
type ( | |||
// Category of the error. Maps to logging/metrics behaviours. | |||
ApplicationErrorCategory string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be exposed outside of internal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exposed in common error
internal/error.go
Outdated
@@ -119,6 +119,9 @@ Workflow consumers will get an instance of *WorkflowExecutionError. This error w | |||
*/ | |||
|
|||
type ( | |||
// Category of the error. Maps to logging/metrics behaviours. | |||
ApplicationErrorCategory string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrmm, I wonder if we really want this to be a string vs just a number (based on the proto number) and can have a String()
on it. A string may make users think it can accept any string literal. Also other enumerates we traditionally do the iota approach. And if/when you do change to integer here, make sure there is an unspecified form representing 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah not sure why I opted for a string... changed to an int enum
internal/error.go
Outdated
@@ -380,6 +385,11 @@ var ( | |||
ErrMissingWorkflowID = errors.New("workflow ID is unset for Nexus operation") | |||
) | |||
|
|||
const ( | |||
// ErrorCategoryBenign indicates an error that is expected under normal operation and should not trigger alerts. | |||
ErrorCategoryBenign ApplicationErrorCategory = "BENIGN" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ErrorCategoryBenign ApplicationErrorCategory = "BENIGN" | |
ApplicationErrorCategoryBenign ApplicationErrorCategory = "BENIGN" |
Need to qualify the const IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
internal/error.go
Outdated
case "": | ||
// Zero value maps to unspecified | ||
return enumspb.APPLICATION_ERROR_CATEGORY_UNSPECIFIED | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If/when we change to integer enums we can/should just use the enum given to us in both directions since both Go and proto allow "open-ended" enums.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, removed
internal/error.go
Outdated
} | ||
} | ||
|
||
func IsBenignApplicationError(err error) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func IsBenignApplicationError(err error) bool { | |
func isBenignApplicationError(err error) bool { |
No need to expose I assume unless it's a helper we really need to expose, but I don't believe we do. A user can easily check for any value on application error without a helper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't mean to export this, unexported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you want this removed btw? Just do the check inline wherever it's used?
Alternatively, should we put this check under a different name (i.e. shouldRecordFailureMetric
or something similar, probably move it to metrics), in case we adopt more conditional logic for failure metrics in the future?
test/integration_test.go
Outdated
ts.Error(err) | ||
var appErr *temporal.ApplicationError | ||
ts.True(errors.As(err, &appErr)) | ||
ts.False(internal.IsBenignApplicationError(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to avoid using internal
in integration tests. Users can't use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done - inline boolean check.
internal/internal_task_pollers.go
Outdated
@@ -474,19 +474,21 @@ func (wtp *workflowTaskPoller) RespondTaskCompletedWithMetrics( | |||
) (response *workflowservice.RespondWorkflowTaskCompletedResponse, err error) { | |||
metricsHandler := wtp.metricsHandler.WithTags(metrics.WorkflowTags(task.WorkflowType.GetName())) | |||
if taskErr != nil { | |||
wtp.logger.Warn("Failed to process workflow task.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are any of the changes made inside of this method needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, old change I forgot to revert
internal/error.go
Outdated
@@ -119,6 +119,9 @@ Workflow consumers will get an instance of *WorkflowExecutionError. This error w | |||
*/ | |||
|
|||
type ( | |||
// Category of the error. Maps to logging/metrics behaviours. | |||
ApplicationErrorCategory string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is ApplicationErrorCategory
intended to be an enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
c914631
to
8f0368d
Compare
- fully qualified names ApplicationErrorCategory<x> - export category for external usage... - make enum int instead of string
8f0368d
to
5646e0a
Compare
internal/error.go
Outdated
var appError *ApplicationError | ||
return errors.As(err, &appError) && appError.Category() == ApplicationErrorCategoryBenign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should assume causes no matter the depth as benign. Explained at temporalio/sdk-java#2485 (comment).
var appError *ApplicationError | |
return errors.As(err, &appError) && appError.Category() == ApplicationErrorCategoryBenign | |
appError, _ := err.(*ApplicationError) | |
return appError != nil && appError.Category() == ApplicationErrorCategoryBenign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should only look at the top level error, no error part of the error is inspected at any depth
@@ -137,6 +137,8 @@ type ( | |||
// | |||
// NOTE: This option is supported by Temporal Server >= v1.24.2 older version will ignore this value. | |||
NextRetryDelay time.Duration | |||
// Category of the error. Maps to logging/metrics behaviours. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Category of the error. Maps to logging/metrics behaviours. | |
// Category of the error. Maps to SDK logging/metrics behaviours. |
This Category
does not effect server side metrics right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the comment then if this is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look great! Would like @Quinn-With-Two-Ns's approval before merge though.
What was changed
Upgrade api-go to 1.48.0
Added support for
ApplicationErrorCategory
, allowing users to specify benign application errors.Why?
Part of the benign exceptions work.
Closes Apply application failure logging and metrics behaviour according to ApplicationErrorCategory #1908
How was this tested:
Modify existing failure conversion tests.
Any docs updates needed?
Not sure