Skip to content

Worker Deployment Versioning #821

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 16, 2025
Merged

Worker Deployment Versioning #821

merged 14 commits into from
Apr 16, 2025

Conversation

Sushisource
Copy link
Member

What was changed

Added the annotations and options for worker-deployment-based versioning

Why?

Feature parity choo choo 🚄

Checklist

  1. Closes [Feature Request] Support New Worker Versioning API #793

  2. How was this tested:
    Added tests

  3. Any docs updates needed?

@Sushisource Sushisource requested a review from a team as a code owner April 10, 2025 00:24
Comment on lines 1633 to 1634
# TODO: Not sure this needs to exist?
# seen_run_attr = True
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to mention in review since this seems sorta wrong. It sets that a run attr was seen if it was seen on the base class, but that's only allowed iff there was one on the subclass, which would mean it was already set true.

Copy link
Member

@cretz cretz Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was so that you don't have duplicate issues in the issues list. So today you'd get only @workflow.run defined on {base_member.__qualname__} but not on the override, but when not setting this, you get that and Missing @workflow.run method which is redundant.

Comment on lines 450 to +442
self._current_completion.failed.failure.SetInParent()
# TODO: Review - odd that we don't un-set success here?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to get an eye on this too. Seems strange we never un-set the set-by-default success variant if there was a failure. I'm kinda surprised this doesn't cause problems, and maybe it just works by a happy accident of field ordering. Maybe it's unset somewhere I missed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Python protobuf, a oneof is automatically unset when you set another of the oneof, ref https://protobuf.dev/reference/python/python-generated/#oneof

raise ValueError(f"Unknown VersioningBehavior: {self}")


class WorkerDeploymentVersion:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should make this a dataclass (can be frozen dataclass, no prob), it gives you that init and you get equality checks, good default string representation, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My worry was if we add fields later on that might cause problems with compat, but I suppose that's not a real issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 We can default them or we can technically even override __init__ if we must

@@ -1016,6 +1016,74 @@ def __post_init__(self):

Priority.default = Priority(priority_key=None)


class VersioningBehavior(Enum):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our other enums that map to proto equivalents use IntEnum and just set the values as int values of the actual proto values (except for some experimental versioning ones that should have been done this way IMO and are going to be deleted at some point I assume). Can look at the existing IntEnums in the SDK.

disable_safe_workflow_eviction: If true, instead of letting the
workflow collect its tasks properly, the worker will simply let
the Python garbage collector collect the tasks. WARNING: Users
should not set this value to true. The garbage collector will
throw ``GeneratorExit`` in coroutines causing them to wake up
in different threads and run ``finally`` and other code in the
wrong workflow environment.
deployment_options: Deployment options for the worker. Exclusive with `build_id` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
deployment_options: Deployment options for the worker. Exclusive with `build_id` and
deployment_config: Deployment config for the worker. Exclusive with `build_id` and

We prefer the term "config" in the Python SDK. Should also change the data type to WorkerDeploymentConfig.



@dataclass
class WorkerDeploymentOptions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May need to mark this and all places it is used as experimental (usually as a docstring warning)

Comment on lines 450 to +442
self._current_completion.failed.failure.SetInParent()
# TODO: Review - odd that we don't un-set success here?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Python protobuf, a oneof is automatically unset when you set another of the oneof, ref https://protobuf.dev/reference/python/python-generated/#oneof



@overload
def run(
Copy link
Member

@cretz cretz Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike Java, Python uses @workflow.defn for implementation options, not @workflow.run. Python (or any other SDK) does not have the forced interface separation Java does. This option, like name, dynamic, sandboxed, failure_exception_types, etc is a whole workflow implementation option and should be set where whole workflow implementation options are set (on @workflow.defn).

@@ -215,6 +255,25 @@ class UnfinishedSignalHandlersWarning(RuntimeWarning):
"""The workflow exited before all signal handlers had finished executing."""


def dynamic_versioning_behavior(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should either be available to every workflow or should not exist. There is nothing about dynamic workflows specifically that deserves special versioning treatment. Might as well allow every workflow implementation to do this. Also, we should allow dynamic workflows to set the same options as regular workflows, so dynamic workflows should be allowed to set a fixed value here on the @workflow.defn decorator instead of only allowed to set versioning this way.

Dynamic versioning behavior doesn't have to be related to dynamic workflows. We should decide if we want dynamic versioning behavior and if we do, it should be generally applicable.

Copy link
Member Author

@Sushisource Sushisource Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well isn't a strong reason here. It's a big footgun I think. It only needs to exist for dynamic workflows because the static annotation isn't sufficient here, so there is something special about them. The reason to restrict it to dynamic workflows is to mitigate the risk of that footgun.

And they are allowed to use the static version on @run-to-be-@defn. That works too.

I feel pretty strongly about that. I'm sort-of-ok with just taking it away entirely until someone asks, too, but we already added this in Java and it feels weird to not do it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only needs to exist for dynamic workflows because the static annotation isn't sufficient here

Why does it need to exist for dynamic workflows? We don't have the ability to have a callback to define failure exception types.

I feel pretty strongly about that. I'm sort-of-ok with just taking it away entirely until someone asks, too, but we already added this in Java and it feels weird to not do it here.

👍 I like this option too. I don't think we should have done it in Java either. I think we should do workflow implementation options the way that the SDK prefers to do workflow implementation options and not make this workflow implementation option any more special than any other.

And they are allowed to use the static version on @run-to-be-@defn. That works too.

The code inside of _WorkflowWorker.__init__ fails if you have a dynamic workflow without this getter and without worker-level setting

Copy link
Member Author

@Sushisource Sushisource Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed that. It's allowed now. The reason you need it is that conceptually a dynamic workflow can implement multiple workflow types. You might want those different types to have different versioning behavior.

We can yeet it until someone asks though, and considering it's easy to misuse, I can just go for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can yeet it until someone asks though, and considering it's easy to misuse, I can just go for that.

👍 Yes, I vote for this. I suspect when someone says they this or they want dynamic failure exception types per workflow type (or any other workflow implementation option per workflow type), we can definitely add them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@Sushisource
Copy link
Member Author

Note to self: Realized core isn't setting deployment options on activity task completions. Fix that and upgrade core in here before merge.

"""

version: WorkerDeploymentVersion
use_worker_versioning: bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmm, I wonder if we should default this to False. I admit I am unsure what the situations are where you'd set deployment config without wanting to set this to True, but it is default false in other SDKs. Regardless, non-blocking...

Copy link
Member Author

@Sushisource Sushisource Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's nice to have people be explicit about it here. It'd generally be True but if you want to set a build ID, not use versioning, and not use the deprecated APIs that'd be the reason for False.

Copy link
Member

@cretz cretz Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmm, do we consider setting build_id= as deprecated if they don't want versioning? We'll always have to support the shortcut. I guess it's similar to worker tuning, which arguably there if we always have to support the simpler shortcut options we might as well not deprecate their use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the problem is that from a product perspective they want to de-emphasize build id in favor of the deployment as a whole

@Sushisource Sushisource enabled auto-merge (squash) April 10, 2025 22:22
@Sushisource Sushisource force-pushed the worker-deployment-versioning branch from 6b65e36 to e5a9f82 Compare April 11, 2025 18:40
@Sushisource Sushisource force-pushed the worker-deployment-versioning branch 2 times, most recently from 3597af6 to 1bbff88 Compare April 11, 2025 18:52
@Sushisource Sushisource force-pushed the worker-deployment-versioning branch from 1bbff88 to ecd1fd7 Compare April 11, 2025 21:00
@Sushisource Sushisource disabled auto-merge April 11, 2025 22:37
@Sushisource
Copy link
Member Author

Can't merge because it looks like the new deadlock interruptor is causing a test to fail that doesn't fail when I comment it out. We may need to back out that change.

@Sushisource Sushisource force-pushed the worker-deployment-versioning branch 4 times, most recently from a64791c to 25ed9a4 Compare April 15, 2025 18:47
@Sushisource Sushisource force-pushed the worker-deployment-versioning branch 2 times, most recently from c5e4604 to 0a81383 Compare April 15, 2025 23:37
@Sushisource Sushisource force-pushed the worker-deployment-versioning branch 6 times, most recently from 4cf77fa to 3d1c940 Compare April 16, 2025 19:59
@Sushisource Sushisource force-pushed the worker-deployment-versioning branch from 3d1c940 to 707473a Compare April 16, 2025 21:16
@Sushisource Sushisource enabled auto-merge (squash) April 16, 2025 21:18
@Sushisource Sushisource merged commit fd49c4e into main Apr 16, 2025
14 checks passed
@Sushisource Sushisource deleted the worker-deployment-versioning branch April 16, 2025 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Support New Worker Versioning API
2 participants