-
Notifications
You must be signed in to change notification settings - Fork 1k
[metal] Use objc2-metal
#5641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
[metal] Use objc2-metal
#5641
Conversation
This may technically be a breaking change if the user implemented these protocols themselves on a `Mutable` class, but that'd be unsound anyhow, so I'll consider this a correctness fix. This is useful for wgpu, see gfx-rs/wgpu#5641, and the hack will become unnecessary after #563.
bd69f0d
to
943f2a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great step forward from having to maintain our own set of bindings manually.
I do have some questions/comments:
- We still seem to be using the
msg_send!
macro in a bunch of places, could we use the generated methods instead? - We also use
autoreleasepool
s in a bunch of places because we had issues with leaks but as far as I understand from the docs ofRetained
, a lot of these (if not all) should now be unnecessary. Is this correct? I think it would make sense to remove them in this PR. - While I appreciate the naming being more consistent with metal's, some of the enum variants now have the enum name as their prefix which feels redundant; but not always which feels odd. Examples:
MTLFeatureSet
's variants are not prefixedMTLLanguageVersion
's variants are all prefixed- one of
MTLReadWriteTextureTier
's variants is not prefixed while the other 2 are)
- Some of the function names are quite long now (ex:
copyFromTexture_sourceSlice_sourceLevel_sourceOrigin_sourceSize_toBuffer_destinationOffset_destinationBytesPerRow_destinationBytesPerImage_options
😆) but I'm not sure how they can be shortened by the generator while also keeping things easy to search for on Apple's docs. - It would be great if all objects had docs from Apple's docs website and/or at least a link to the page on said website but no hurry, just giving my +1 for it :).
wgpu-hal/src/metal/device.rs
Outdated
// TODO: `newComputePipelineStateWithDescriptor:error:` is not exposed on | ||
// `MTLDevice`, is this always correct? | ||
fn new_compute_pipeline_state_with_descriptor( | ||
device: &ProtocolObject<dyn MTLDevice>, | ||
descriptor: &MTLComputePipelineDescriptor, | ||
) -> Result<Retained<ProtocolObject<dyn MTLComputePipelineState>>, Retained<NSError>> { | ||
unsafe { msg_send_id![device, newComputePipelineStateWithDescriptor: descriptor, error: _] } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's odd that it doesn't exist since newRenderPipelineStateWithDescriptor:error:
does.
We could use newComputePipelineStateWithDescriptor:options:reflection:error:
though and pass MTLPipelineOptionNone
for options
and nil
for reflection
.
so we did use it for a while, but then were able to remove the use on it, so yes, its not necessary anymore |
Thanks for taking a look!
I've already done this in a separate branch, but there were some issues around the semantics not being quite the same, so I wanted to keep it for a separate PR where that discussion could be more focused.
There's a lot of nuance to this question, including the fact that the optimization that allows us to avoid an autorelease is not guaranteed, only likely (depends on the exact emitted assembly, inlining, and the phase of the moon). What has changed though is that autorelease pools no longer have any effect on the program, other than in terms of memory usage (autoreleased objects can be reclaimed sooner) and runtime performance (pushing and popping an autorelease pool has a slight overhead). I'd really prefer to keep it out of this PR, mostly because it'll screw with the diff even more than it already is (indentation of large function bodies will change), but also partly because I do not know why each of these are here, and I'd like to retain that memory-usage profile until deemed wise to do otherwise.
Yup, that's a bug, the logic for implementing this name translation is very simplistic and a bit hastily thrown together - Swift has the correct rules written down, but it'll be breaking to change, so I'll fix it in
Yeah :/ Feel free to comment on madsmtm/objc2#284 if you think of a good solution (or just any solution)!
There's madsmtm/objc2#309 open for it, the local Xcode documentation is stored in an undocumented format that I spent a few hours on, but couldn't immediately reverse-engineer. Linking is similarly difficult, not for class names, but for methods, they have some ID in them, which is why I didn't pursue this - but I guess just linking to the class name would still be a step up, will try to prioritize it. |
I just tried to actually benchmark this, but it's not an art that I'm familiar with, and the results seemed inconclusive (both improvements and regressions, without any clear pattern that I could discern). I'd suggest that someone more familiar with that tries to benchmark this. |
@madsmtm For what it's worth we used to see a huge amount of time in |
I've opened #6107 to fix the semantic issues, btw. Whichever of these two PRs merge first, I'll update the other to remove the remaining |
980b16b
to
86a12fb
Compare
This PR has been open since the spring, so I want to let folks know what's going on with it. We (the wgpu maintainers) think Firefox has been incorporating wgpu Firefox uses a supply-chain auditing tool called But this means that taking this PR means that Firefox cannot update the wgpu sources in Mozilla Central until we have audited every line of The next steps for this PR are:
What I'm hoping to get across here is that, despite the delay, this PR is important to us, and although we have some very stern constraints, we're moving the process along. |
Thank you very much for the update @jimblandy! It's really nice to see that you're taking supply chain attacks seriously, it's something we very much need to be better at in our industry! Regarding the ability to audit/review the Code-wise, I can give a quick primer:
Feel free to ask if there's something here that you need help with, or if there's something I can change to make things easier to audit (now and in the future)! On the social/people side, would it help if we had an online meeting or something? Though I guess that could be construed as a form of social engineering attack, and wouldn't really add anything trust-wise. In any case, thanks again for the update, and I'm totally fine with waiting a few more months, or however long it takes you! |
In our meeting today, the wgpu maintainers decided to mark this as draft, since we'll be getting back to this in 2025. |
03eff06
to
edab1b3
Compare
Small update: I've released a new version of I also tried running the benchmarks again, seems to be better than last time I tried it, now all changes are within ~2% of the current |
ac07f5d
to
6983a9d
Compare
49e1b68
to
858f0bd
Compare
Made the diff a bit smaller by filing #7631 first. |
This removes the manual implementation introduced in 7b00140.
This is no longer necessary, since Winit can now properly issue redraw events on resize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I think this thing's been open long enough, and needs some movement. I think it's easy to say that the vast majority of this code doesn't feel risky. There are some things that seem unexpected, but minor to resolve. I've filed these as individual conversations, and pinged attention from other maintainers where I thought it'd be helpful.
Once these are resolved, I'm inclined to approve, based on conversations here being resolved to everyone's satisfaction. After this, the only blocker to merging will be auditing the objc2
ecosystem for its unsafe
usage, to satisfy Mozilla's supply chain processes. This appears to be very non-trivial, and I don't have bandwidth to do this myself, but hey, at least nobody has to go over the entire review again!
CC @jimblandy.
@@ -60,47 +74,48 @@ impl super::CommandEncoder { | |||
.contains(TimestampQuerySupport::ON_BLIT_ENCODER); | |||
|
|||
if !self.state.pending_timer_queries.is_empty() && !supports_sample_counters_in_buffer { | |||
objc::rc::autoreleasepool(|| { | |||
let descriptor = metal::BlitPassDescriptor::new(); | |||
autoreleasepool(|_| unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Can we please narrow the spans that this unsafe
block applies to? If you're aware of the rationale, a SAFETY
comment would be great, but I know that in other places you've disclaimed knowledge of the Metal API specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reason I did it this way is mostly the same as in #5641 (comment): this unsafe
will likely disappear some day, and it kept the diff smaller.
@@ -348,15 +366,15 @@ impl crate::CommandEncoder for super::CommandEncoder { | |||
// the amount of data to copy. | |||
0 | |||
}; | |||
encoder.copy_from_buffer_to_texture( | |||
encoder.copyFromBuffer_sourceOffset_sourceBytesPerRow_sourceBytesPerImage_sourceSize_toTexture_destinationSlice_destinationLevel_destinationOrigin_options( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: Oh wow, this (what I presume to be an) overload name is awful. 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if you have suggestions for better ways of doing this naming, feel free to drop them in madsmtm/objc2#284.
desc.set_depth_failure_operation(conv::map_stencil_op(face.depth_fail_op)); | ||
desc.set_depth_stencil_pass_operation(conv::map_stencil_op(face.pass_op)); | ||
) -> Retained<MTLStencilDescriptor> { | ||
let desc = unsafe { MTLStencilDescriptor::new() }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: It's sorta weird that constructing a new one of these is unsafe
… 🤔 What preconditions could possibly need to be upheld? I say this as an Apple platform programming newbie.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally unsafe new means something about dropping being unsafe/unsound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly same as in #5641 (comment): this unsafe
will likely disappear some day.
// Avoid noise, many objc2-metal functions are still `unsafe`. | ||
// See also https://github.com./madsmtm/objc2/issues/685. | ||
#![allow(unsafe_op_in_unsafe_fn)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: IMO, it's okay to have this noise, unless you're really convinced that most of it will fall away. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To this end, we actually put this lint on intentionally and migrated all of hal to use it before it was enabled by default, so we are quite happy with having unsafe around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fairly confident that a lot of the unsafe
will disappear, and so I felt that adding that line was valuable, since it made the already large diff easier to review.
Though I guess adding unsafe
now would make it easier for you to migrate once the things are marked safe? Another option would be to open a new tracking issue and assigning me to it.
Regardless, I'll try to prioritize this work in objc2
.
pub unsafe fn from_view(view: NonNull<Object>) -> Self { | ||
let layer = unsafe { Self::get_metal_layer(view) }; | ||
let layer = ManuallyDrop::new(layer); | ||
// SAFETY: The layer is an initialized instance of `CAMetalLayer`, and | ||
// we transfer the retain count to `MetalLayer` using `ManuallyDrop`. | ||
let layer = unsafe { metal::MetalLayer::from_ptr(layer.cast()) }; | ||
Self::new(layer) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Pinging @gfx-rs/wgpu here: is it problematic if metal::Surface::from_view
goes away WRT user-exposed functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is likely to be an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is public API, didn't realize that since it isn't present on docs.rs.
I'll see what I can do about keeping some form of it around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have opened #7649 for this, will rebase once that lands.
#[cfg(target_vendor = "apple")] | ||
(Rwh::AppKit(handle), _) | ||
if self.shared.extensions.contains(&ext::metal_surface::NAME) => | ||
{ | ||
self.create_surface_from_view(handle.ns_view) | ||
let layer = unsafe { raw_window_metal::Layer::from_ns_view(handle.ns_view) }; | ||
self.create_surface_from_layer(layer) | ||
} | ||
#[cfg(all(any(target_os = "ios", target_os = "visionos"), feature = "metal"))] | ||
#[cfg(target_vendor = "apple")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: How is it correct to move from target_os
checks against subsets of the Apple hardware ecosystem to target_vendor = "apple"
? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a bunch of nuances to the relationship between UIKit/AppKit and the different Apple OSes. One of them is that all(target_os = "ios", target_abi = "macabi")
actually also supports AppKit, so if Wgpu wants to be maximally usable, you'd actually have to use #[cfg(any(target_os = "macos", all(target_os = "ios", target_abi = "macabi")))]
on the AppKit
variant.
Instead of handling all that, we delegate to the producer of the handle; if Winit or similar produced an AppKitWindowHandle
, we trust that the ns_view
field in there really is an NSView
, regardless of which Apple platform we're currently running on. raw-window-metal
is also designed to be "platform agnostic" in the sense that it's APIs are available on all Apple platforms, regardless of whether UIView
/NSView
makes sense for that specific platform.
To keep the diff smaller and easier to review, this uses a temporary fork of `objc2-metal` and `objc2-quartz-core` whose methods use the naming scheme of the `metal` crate. One particular difficult part with this is that the `metal` crate has several methods where the order of the arguments are swapped relative to the corresponding Objective-C methods. This includes most perilously (since these have both an offset and an index argument, both of which are integers): - `set_bytes` - `set_vertex_bytes` - `set_fragment_bytes` - `set_buffer` - `set_vertex_buffer` - `set_fragment_buffer` - `set_threadgroup_memory_length` But also: - `set_vertex_texture` - `set_fragment_texture` - `set_sampler_state` - `set_vertex_sampler_state` - `set_fragment_sampler_state` Another noteworthy thing to mention is that `objc2-metal` does not (yet) provide a fallback for MTLCopyAllDevices, so we have to do that ourselves: madsmtm/objc2@3543940
Thanks for the review Erich.
Again, do feel free to say if there's something I can do to make that process easier! |
Description
Use the
objc2-metal
crate instead of themetal
crate. This will:Arc
, as Metal objects are already reference-counted.objc_retainAutoreleasedReturnValue
underneath the hood to avoid putting objects into the autorelease pool when possible, reducing memory pressure.Background
The
metal
crate contains bindings to the Metal framework. This usesobjc
to manually perform the message sends, which is quite error-prone, see gfx-rs/metal-rs#284, gfx-rs/metal-rs#319 and gfx-rs/metal-rs#209 for a few examples of unsoundness (out of many).To solve such problems in the Rust ecosystem in general, I created a successor to
objc
calledobjc2
, which contains most notably the smart-pointerRetained
and an improvedmsg_send!
macro, that together ensure that Objective-C's memory-management rules are upheld correctly.This is only part of the solution though - we'd still have to write the bindings manually. To solve this, I created a tool (planning to integrate it with
bindgen
, but that's likely a multi-year project) to generate such framework crates automatically. In acknowledgement that this tool is by far not perfect, and never will be, I've ensured that there's a bunch of options to modify each generated crate.The modifications for
objc2-metal
in particular are currently just a few hundred lines of code, weak evidence that the generator is fairly good at this point. I'll also bring attention to the file whereunsafe
methods are marked safe - I have plans to investigate ways to semi-automatically figure out if something is safe or not, or at least reduce the burden of doing so, see madsmtm/objc2#685, but it's a hard problem to ensure is completely sound, so for now it's a bit verbose.Connections
gpu-allocator
has transitioned toobjc2-metal
in Traverse-Research/gpu-allocator#225, andblade
has transitioned in kvark/blade#229.gfx-rs/metal-rs#241 is an old open PR for using
objc2
inmetal
internally instead. There currently isn't really a clear path forwards there, and it's a lot of work for less direct benefits to the ecosystem (wgpu-hal
is by far the biggest user ofmetal
). But more fundamentally IMO, it's a problem of separation of concerns;metal
defines several Foundation types likeNSArray
,NSString
,NSError
and so on, and evenCAMetalLayer
from QuartzCore, and that's a bad idea for interoperability compared to having separate crates for each of these frameworks.#5752 removed the
link
feature which is not available in theobjc2
crates.#6210 transitively depends on
objc2
.Implementation
The first commit implements the actual migration, by using a branch of
objc2
, with a method naming scheme that (more closely) matchesmetal
, to make it easier to review and test what's changed. Especially confusing is arguments that are re-ordered between the Objective-C API, and the API thatmetal-rs
exposes, see gfx-rs/metal-rs#144.The second commit moves to the real naming scheme that
objc2
uses.I'd strongly recommend you review these two commits separately.
Testing
Tested by using the checklist below, as well as running each example individually, and checking that they seem to work.
During the development of this I made two quite critical typos, which were luckily found by the test suite, but there's bound to be at least one more lurking in here somewhere, please test this thoroughly!
Checklist
cargo fmt
.cargo clippy
.cargo xtask test
to run tests.CHANGELOG.md
. See simple instructions inside file.