-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[Inference API] Add unified api for chat completions #117589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…/elasticsearch into ml-inference-unified-api-elastic
Pinging @elastic/ml-core (Team:ML) |
Hi @maxhniebergall, I've created a changelog YAML for you. |
…ence-unified-api-elastic
Tests to add:
TODO
|
…/elasticsearch into ml-inference-unified-api-elastic
server/src/main/java/org/elasticsearch/inference/UnifiedCompletionRequest.java
Outdated
Show resolved
Hide resolved
…/elasticsearch into ml-inference-unified-api-elastic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…/elasticsearch into ml-inference-unified-api-elastic
|
||
builder.field(MODEL_FIELD, model.getServiceSettings().modelId()); | ||
if (unifiedRequest.maxCompletionTokens() != null) { | ||
builder.field(MAX_COMPLETION_TOKENS_FIELD, unifiedRequest.maxCompletionTokens()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that the OpenAiChatCompletionServiceSettings has a similar field that isn't used (even previous to this PR). I wonder if we should sync those fields up like we do the modelId
🤔 . I think we've typically used the max tokens to do the truncation for text embedding so I don't think we really used it for completions.
@maxhniebergall what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, yea it sounds like we should use the value from the service settings if its available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
public void toXContent(XContentBuilder builder, ToXContent.Params params) throws IOException { | ||
builder.value(content); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is function required? The class does not declare that is implements toXContent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, yeah we can remove that. Thanks.
...ore/src/main/java/org/elasticsearch/xpack/core/inference/action/UnifiedCompletionAction.java
Show resolved
Hide resolved
return e; | ||
} | ||
|
||
if (taskType != TaskType.COMPLETION) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (taskType != TaskType.COMPLETION) { | |
if (taskType.isAnyOrSame(TaskType.COMPLETION)) { |
For the case where tasktype is not set in the URL and defaulted to ANY
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good catch!
private Deque<StreamingUnifiedChatCompletionResults.ChatCompletionChunk> singleItem( | ||
StreamingUnifiedChatCompletionResults.ChatCompletionChunk result | ||
) { | ||
var deque = new ArrayDeque<StreamingUnifiedChatCompletionResults.ChatCompletionChunk>(2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why size 2 and not 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, it was in the code Pat sent me for this. I also thought it was odd prwhelan@4c573ba
for (UnifiedCompletionRequest.Message message : unifiedRequest.messages()) { | ||
builder.startObject(); | ||
{ | ||
switch (message.content()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! TIL
private final Deque<StreamingUnifiedChatCompletionResults.ChatCompletionChunk> buffer = new LinkedBlockingDeque<>(); | ||
|
||
@Override | ||
protected void onRequest(long n) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a while to grok that onRequest
is not part of the Flow interface. Maybe call this upstreamRequest
not to confuse it with the various Flow on* methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep sounds good. I still don't quite understand what all that stuff is doing haha. I'll have Pat give an overview when he gets back maybe.
…ence-unified-api-elastic
…/elasticsearch into ml-inference-unified-api-elastic
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
* Adding some shell classes * modeling the request objects * Writeable changes to schema * Working parsing tests * Creating a new action * Add outbound request writing (WIP) * Improvements to request serialization * Adding separate transport classes * separate out unified request and combine inputs * Reworking unified inputs * Adding unsupported operation calls * Fixing parsing logic * get the build working * Update docs/changelog/117589.yaml * Fixing injection issue * Allowing model to be overridden but not working yet * Fixing issues * Switch field name for tool * Add suport for toolCalls and refusal in streaming completion * Working tool call response * Separate unified and legacy code paths * Updated the parser, but there are some class cast exceptions to fix * Refactoring tests and request entities * Parse response from OpenAI * Removing unused request classes * precommit * Adding tests for UnifiedCompletionAction Request * Refactoring stop to be a list of strings * Testing for OpenAI response parsing * Refactoring transport action tests to test unified validation code * Fixing various tests * Fixing license header * Reformat streaming results * Finalize response format * remove debug logs * remove changes for debugging * Task type and base inference action tests * Adding openai service tests * Adding model tests * tests for StreamingUnifiedChatCompletionResultsTests toXContentChunked * Fixing change log and removing commented out code * Switch usage to accept null * Adding test for TestStreamingCompletionServiceExtension * Avoid serializing empty lists + request entity tests * Register named writeables from UnifiedCompletionRequest * Removing commented code * Clean up and add more of an explination * remove duplicate test * remove old todos * Refactoring some duplication * Adding javadoc * Addressing feedback --------- Co-authored-by: Jonathan Buttner <[email protected]> Co-authored-by: Jonathan Buttner <[email protected]> (cherry picked from commit 467fdb8) # Conflicts: # x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceCrudIT.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceAction.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/common/DelegatingProcessor.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/action/TransportInferenceActionTests.java
* Adding some shell classes * modeling the request objects * Writeable changes to schema * Working parsing tests * Creating a new action * Add outbound request writing (WIP) * Improvements to request serialization * Adding separate transport classes * separate out unified request and combine inputs * Reworking unified inputs * Adding unsupported operation calls * Fixing parsing logic * get the build working * Update docs/changelog/117589.yaml * Fixing injection issue * Allowing model to be overridden but not working yet * Fixing issues * Switch field name for tool * Add suport for toolCalls and refusal in streaming completion * Working tool call response * Separate unified and legacy code paths * Updated the parser, but there are some class cast exceptions to fix * Refactoring tests and request entities * Parse response from OpenAI * Removing unused request classes * precommit * Adding tests for UnifiedCompletionAction Request * Refactoring stop to be a list of strings * Testing for OpenAI response parsing * Refactoring transport action tests to test unified validation code * Fixing various tests * Fixing license header * Reformat streaming results * Finalize response format * remove debug logs * remove changes for debugging * Task type and base inference action tests * Adding openai service tests * Adding model tests * tests for StreamingUnifiedChatCompletionResultsTests toXContentChunked * Fixing change log and removing commented out code * Switch usage to accept null * Adding test for TestStreamingCompletionServiceExtension * Avoid serializing empty lists + request entity tests * Register named writeables from UnifiedCompletionRequest * Removing commented code * Clean up and add more of an explination * remove duplicate test * remove old todos * Refactoring some duplication * Adding javadoc * Addressing feedback --------- Co-authored-by: Jonathan Buttner <[email protected]> Co-authored-by: Jonathan Buttner <[email protected]> (cherry picked from commit 467fdb8) # Conflicts: # x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceCrudIT.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceAction.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/common/DelegatingProcessor.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/action/TransportInferenceActionTests.java
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
* Adding some shell classes * modeling the request objects * Writeable changes to schema * Working parsing tests * Creating a new action * Add outbound request writing (WIP) * Improvements to request serialization * Adding separate transport classes * separate out unified request and combine inputs * Reworking unified inputs * Adding unsupported operation calls * Fixing parsing logic * get the build working * Update docs/changelog/117589.yaml * Fixing injection issue * Allowing model to be overridden but not working yet * Fixing issues * Switch field name for tool * Add suport for toolCalls and refusal in streaming completion * Working tool call response * Separate unified and legacy code paths * Updated the parser, but there are some class cast exceptions to fix * Refactoring tests and request entities * Parse response from OpenAI * Removing unused request classes * precommit * Adding tests for UnifiedCompletionAction Request * Refactoring stop to be a list of strings * Testing for OpenAI response parsing * Refactoring transport action tests to test unified validation code * Fixing various tests * Fixing license header * Reformat streaming results * Finalize response format * remove debug logs * remove changes for debugging * Task type and base inference action tests * Adding openai service tests * Adding model tests * tests for StreamingUnifiedChatCompletionResultsTests toXContentChunked * Fixing change log and removing commented out code * Switch usage to accept null * Adding test for TestStreamingCompletionServiceExtension * Avoid serializing empty lists + request entity tests * Register named writeables from UnifiedCompletionRequest * Removing commented code * Clean up and add more of an explination * remove duplicate test * remove old todos * Refactoring some duplication * Adding javadoc * Addressing feedback --------- Co-authored-by: Jonathan Buttner <[email protected]> Co-authored-by: Jonathan Buttner <[email protected]> (cherry picked from commit 467fdb8) # Conflicts: # x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceCrudIT.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceAction.java
…118772) * [Inference API] Add unified api for chat completions (#117589) * Adding some shell classes * modeling the request objects * Writeable changes to schema * Working parsing tests * Creating a new action * Add outbound request writing (WIP) * Improvements to request serialization * Adding separate transport classes * separate out unified request and combine inputs * Reworking unified inputs * Adding unsupported operation calls * Fixing parsing logic * get the build working * Update docs/changelog/117589.yaml * Fixing injection issue * Allowing model to be overridden but not working yet * Fixing issues * Switch field name for tool * Add suport for toolCalls and refusal in streaming completion * Working tool call response * Separate unified and legacy code paths * Updated the parser, but there are some class cast exceptions to fix * Refactoring tests and request entities * Parse response from OpenAI * Removing unused request classes * precommit * Adding tests for UnifiedCompletionAction Request * Refactoring stop to be a list of strings * Testing for OpenAI response parsing * Refactoring transport action tests to test unified validation code * Fixing various tests * Fixing license header * Reformat streaming results * Finalize response format * remove debug logs * remove changes for debugging * Task type and base inference action tests * Adding openai service tests * Adding model tests * tests for StreamingUnifiedChatCompletionResultsTests toXContentChunked * Fixing change log and removing commented out code * Switch usage to accept null * Adding test for TestStreamingCompletionServiceExtension * Avoid serializing empty lists + request entity tests * Register named writeables from UnifiedCompletionRequest * Removing commented code * Clean up and add more of an explination * remove duplicate test * remove old todos * Refactoring some duplication * Adding javadoc * Addressing feedback --------- Co-authored-by: Jonathan Buttner <[email protected]> Co-authored-by: Jonathan Buttner <[email protected]> (cherry picked from commit 467fdb8) # Conflicts: # x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceCrudIT.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceAction.java * fix merge conflicts * formatting * Remove tests - retain feature flag * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <[email protected]>
Unified API communicating with OpenAI
Testing
Running ES
The
_unified
route is behind a feature flag, so to enable it run es like this:Creating endpoint and sending requestions
Creating a completion endpoint
Completion request
Response format
A sequence of partial responses: