-
Notifications
You must be signed in to change notification settings - Fork 158
Add application err category #2485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Don't like the API changes to ApplicationFailure, but figured I'd get something up first for discussion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems in other parts of the SDK, e.g. parent close policy, we just expose the raw enum from proto. Should we do the same here? It has "unrecognized", automatically gets new values, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't notice that, have changed to use raw proto
case UNRECOGNIZED: | ||
default: | ||
// Fallback unrecognized or unspecified proto values as UNSPECIFIED | ||
return UNSPECIFIED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the concept of unrecognized is ideal and different from unspecified (then again though, arguably we should be using raw API proto enums and not a new enumerate here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using raw API proto enums now
@@ -218,4 +274,43 @@ private static String getMessage(String message, String type, boolean nonRetryab | |||
+ ", nonRetryable=" | |||
+ nonRetryable; | |||
} | |||
|
|||
public static boolean isBenignApplicationFailure(@Nullable Throwable t) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easier to justify than the Go equivalent I suppose, but I still think this helper may not be needed and may be more confusing than it's worth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to an internal failure utils file, it's convenient to have imo
// Handle ActivityFailure, which wraps the actual ApplicationFailure | ||
if (t instanceof io.temporal.failure.ActivityFailure) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to expose this kind of logic to users. A user checking whether an in-workflow-thrown application failure is benign may not want to have it match when an activity raises it. I think if a user wants to check a category for whatever reason, they can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, moved to an internal util file
@@ -153,7 +154,9 @@ private void completeWorkflow(@Nullable WorkflowExecutionException failure) { | |||
metricsScope.counter(MetricsType.WORKFLOW_CANCELED_COUNTER).inc(1); | |||
} else if (failure != null) { | |||
workflowStateMachines.failWorkflow(failure.getFailure()); | |||
metricsScope.counter(MetricsType.WORKFLOW_FAILED_COUNTER).inc(1); | |||
if (!ApplicationFailure.isBenignApplicationFailure(failure)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think you want to swallow activities that fail because of benign exceptions. They may reach a max attempts or something and throw this out. Just because an activity has a benign exception it doesn't want to be in telemetry doesn't mean a workflow doesn't want it in telemetry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not the same pattern in:
temporalio/sdk-go#1925
temporalio/sdk-core#905
Unless I'm mistaken.
What about if a user throws a benign exception in workflow code, not in an activity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was unclear here. This is basically me restating https://github.com./temporalio/sdk-java/pull/2485/files#r2054385164.
What about if a user throws a benign exception in workflow code, not in an activity?
That is fine to skip metrics, but we don't want to skip metrics if the exception in the workflow code is an activity failure with a benign cause. There is a difference between not recording telemetry on benign exceptions and not recording telemetry on an exception that has a benign cause.
Here are some scenarios:
- Activity throws benign exception - do not record telemetry
- Workflow throws benign exception - do not record telemetry
- Activity or child fails with benign exception from workflow POV (thrown from activity and it is out of max attempts or something) - still record telemetry
- Activity uses a Temporal client that runs a workflow and that workflow throws benign exception - still record telemetry
- In Python, an activity has a general
catch
that caught the benign exception, tried to do cleanup, and raised its own exception (which automatically sets the cause as the current catch exception) - still record telemetry
In the last 3 bullets on certain languages, they will be treated as benign and shouldn't be IMO. Basically, change isBenignApplicationFailure
to stop checking causes IMO. IMO throwing a benign exception is benign, but receiving a benign exception wrapped in another exception is not benign.
Is this not the same pattern in:
temporalio/sdk-go#1925
temporalio/sdk-core#905
Unless I'm mistaken.
Not exactly because Go stops at the first application error, not every application error no matter the depth. Also, this isn't recursively checking causes to arbitrary depths like Go. Still, looking back on Go, we should only apply it to the current error and not check cause or errors.As
IMO. I added a comment at temporalio/sdk-go#1925 (comment). Didn't check Core, but I think the same applies there too.
If you think about how users use benign exceptions (as control flow), they aren't supposed to wrap them. Open to discussion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't they inherently be wrapped as other failure types though? Could be wrong here, but for example, an ApplicationFailure thrown from an activity, would it not always (or at least in the general case) be wrapped as an ActivityFailure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree in that it doesn't seem relevant/useful to check for benign causes at an arbitrary depth, but I though a depth of 0 (immediate) or 1 (thinly wrapped) would be sufficient.
In Java SDK, how would one receive an ApplicationFailure that is not wrapped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't they inherently be wrapped as other failure types though?
No. They are only benign where they are thrown (impl side) and they are usually only wrapped where they are caught (caller side).
Could be wrong here, but for example, an ApplicationFailure thrown from an activity, would it not always (or at least in the general case) be wrapped as an ActivityFailure?
Not on the activity side, only on the workflow caller side. But if an activity fails because of a benign error, even though it is benign on the activity side, if that activity failure bubbles out of a workflow, that is not benign.
I agree in that it doesn't seem relevant/useful to check for benign causes at an arbitrary depth, but I though a depth of 0 (immediate) or 1 (thinly wrapped) would be sufficient.
Even 1 depth treats a benign activity failure as benign on the workflow side which is incorrect IMO (not to mention Go side is arbitrary depth). I just checked Core, Core does it right in that it doesn't recurse.
In Java SDK, how would one receive an ApplicationFailure that is not wrapped?
What do you mean by "one"? If you mean how does our SDK internals receive an unwrapped thrown error, it is where you have changes in ActivityTaskExecutors.java
. If you mean how does a workflow caller receive it unwrapped, they don't, that's the point.
An error is only benign where it is thrown impl side, not where it is caught caller side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification, that all sounds reasonable (and saved me some digging :))
I've removed the nested failure checks, only check immediate failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Will want to wait for @Quinn-With-Two-Ns's approval here.
import io.temporal.failure.ApplicationFailure; | ||
import javax.annotation.Nullable; | ||
|
||
public class FailureUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pedantic, but would add a private constructor here
@@ -51,13 +52,15 @@ | |||
* <li>nonRetryable is set to false | |||
* <li>details are set to null | |||
* <li>stack trace is copied from the original exception | |||
* <li>stack category is set to ApplicationErrorCategory.APPLICATION_ERROR_CATEGORY_UNSPECIFIED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* <li>stack category is set to ApplicationErrorCategory.APPLICATION_ERROR_CATEGORY_UNSPECIFIED | |
* <li>category is set to ApplicationErrorCategory.APPLICATION_ERROR_CATEGORY_UNSPECIFIED |
@@ -21,6 +21,7 @@ | |||
package io.temporal.failure; | |||
|
|||
import com.google.common.base.Strings; | |||
import io.temporal.api.enums.v1.ApplicationErrorCategory; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't use the proto enum in the user facing API, we should declare our own enum in the Java SDK and convert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I didn't notice before you were still using the proto enum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I formerly had this but: #2485 (comment)
It looked like we use raw protos, when do we use each pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All new features should NOT use raw proto, only old features use raw protos
temp.txt
Outdated
@@ -0,0 +1,483 @@ | |||
diff --git a/temporal-sdk/src/main/java/io/temporal/failure/ApplicationErrorCategory.java b/temporal-sdk/src/main/java/io/temporal/failure/ApplicationErrorCategory.java |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops - rm'd
package io.temporal.failure; | ||
|
||
/** | ||
* Mirrors the proto definition for ApplicationErrorCategory. Used to categorize application |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Mirrors the proto definition for ApplicationErrorCategory. Used to categorize application | |
* Used to categorize application |
I don't think we need to mention the proto def. here the docs are for the end user and that isn't really relevant for them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
4182485
to
823848e
Compare
What was changed
Add support for application error category.
Why?
SDK work for benign exceptions
Closes Apply application failure logging and metrics behaviour according to
ApplicationErrorCategory
#2475How was this tested:
Integration tests
Any docs updates needed?
Likely