Skip to content

[Inference API] Add unified api for chat completions #117589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 75 commits into from
Dec 6, 2024

Conversation

maxhniebergall
Copy link
Contributor

@maxhniebergall maxhniebergall commented Nov 26, 2024

Unified API communicating with OpenAI

Testing

Running ES

The _unified route is behind a feature flag, so to enable it run es like this:

./gradlew :run -Drun.license_type=trial -Des.inference_unified_feature_flag_enabled=true

Creating endpoint and sending requestions

Creating a completion endpoint

PUT http://localhost:9200/_inference/completion/test
{
    "service": "openai",
    "service_settings": {
        "api_key": "<api key>",
        "model_id": "gpt-4o"
    }
}

Completion request

POST http://localhost:9200/_inference/completion/test/_unified
{
    "model": "gpt-4o",
    "messages": [
        {
            "role": "user",
            "content": "What is the weather like in Boston today?"
        }
    ],
    "stop": "none",
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        },
                        "unit": {
                            "type": "string",
                            "enum": [
                                "celsius",
                                "fahrenheit"
                            ]
                        }
                    },
                    "required": [
                        "location"
                    ]
                }
            }
        }
    ],
    "tool_choice": "auto"
}

Response format

A sequence of partial responses:

{
    "id": "chatcmpl-AamGq3bqfB5bhmDaBLYi5Z5kTUz5R",
    "choices": [
        {
            "delta": {
                "content": " ol"
            },
            "index": 0
        }
    ],
    "model": "gpt-3.5-turbo-0125",
    "object": "chat.completion.chunk"
}

{
    "id": "chatcmpl-AamGq3bqfB5bhmDaBLYi5Z5kTUz5R",
    "choices": [
        {
            "delta": {
                "content": "iv"
            },
            "index": 0
        }
    ],
    "model": "gpt-3.5-turbo-0125",
    "object": "chat.completion.chunk"
}

{
    "id": "chatcmpl-AamGq3bqfB5bhmDaBLYi5Z5kTUz5R",
    "choices": [
        {
            "delta": {
                "content": "ine"
            },
            "index": 0
        }
    ],
    "model": "gpt-3.5-turbo-0125",
    "object": "chat.completion.chunk"
}

{
    "id": "chatcmpl-AamGq3bqfB5bhmDaBLYi5Z5kTUz5R",
    "choices": [
        {
            "delta": {
                "content": "."
            },
            "index": 0
        }
    ],
    "model": "gpt-3.5-turbo-0125",
    "object": "chat.completion.chunk"
}

{
    "id": "chatcmpl-AamGq3bqfB5bhmDaBLYi5Z5kTUz5R",
    "choices": [
        {
            "delta": {},
            "finish_reason": "stop",
            "index": 0
        }
    ],
    "model": "gpt-3.5-turbo-0125",
    "object": "chat.completion.chunk"
}

[DONE
]

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Nov 26, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @maxhniebergall, I've created a changelog YAML for you.

@maxhniebergall
Copy link
Contributor Author

maxhniebergall commented Nov 26, 2024

Tests to add:

TODO

  • Create a list of all new named writables and add them to the registry ✅
    Address outstanding TODOs ✅

Copy link
Contributor Author

@maxhniebergall maxhniebergall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


builder.field(MODEL_FIELD, model.getServiceSettings().modelId());
if (unifiedRequest.maxCompletionTokens() != null) {
builder.field(MAX_COMPLETION_TOKENS_FIELD, unifiedRequest.maxCompletionTokens());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that the OpenAiChatCompletionServiceSettings has a similar field that isn't used (even previous to this PR). I wonder if we should sync those fields up like we do the modelId 🤔 . I think we've typically used the max tokens to do the truncation for text embedding so I don't think we really used it for completions.

@maxhniebergall what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, yea it sounds like we should use the value from the service settings if its available

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 234 to 236
public void toXContent(XContentBuilder builder, ToXContent.Params params) throws IOException {
builder.value(content);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is function required? The class does not declare that is implements toXContent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yeah we can remove that. Thanks.

return e;
}

if (taskType != TaskType.COMPLETION) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (taskType != TaskType.COMPLETION) {
if (taskType.isAnyOrSame(TaskType.COMPLETION)) {

For the case where tasktype is not set in the URL and defaulted to ANY

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good catch!

private Deque<StreamingUnifiedChatCompletionResults.ChatCompletionChunk> singleItem(
StreamingUnifiedChatCompletionResults.ChatCompletionChunk result
) {
var deque = new ArrayDeque<StreamingUnifiedChatCompletionResults.ChatCompletionChunk>(2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why size 2 and not 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, it was in the code Pat sent me for this. I also thought it was odd prwhelan@4c573ba

for (UnifiedCompletionRequest.Message message : unifiedRequest.messages()) {
builder.startObject();
{
switch (message.content()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! TIL

private final Deque<StreamingUnifiedChatCompletionResults.ChatCompletionChunk> buffer = new LinkedBlockingDeque<>();

@Override
protected void onRequest(long n) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to grok that onRequest is not part of the Flow interface. Maybe call this upstreamRequest not to confuse it with the various Flow on* methods

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep sounds good. I still don't quite understand what all that stuff is doing haha. I'll have Pat give an overview when he gets back maybe.

@jonathan-buttner jonathan-buttner enabled auto-merge (squash) December 6, 2024 19:16
@jonathan-buttner jonathan-buttner merged commit 467fdb8 into main Dec 6, 2024
17 of 18 checks passed
@jonathan-buttner jonathan-buttner deleted the ml-inference-unified-api-elastic branch December 6, 2024 20:52
@maxhniebergall
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

maxhniebergall added a commit to maxhniebergall/elasticsearch that referenced this pull request Dec 11, 2024
* Adding some shell classes

* modeling the request objects

* Writeable changes to schema

* Working parsing tests

* Creating a new action

* Add outbound request writing (WIP)

* Improvements to request serialization

* Adding separate transport classes

* separate out unified request and combine inputs

* Reworking unified inputs

* Adding unsupported operation calls

* Fixing parsing logic

* get the build working

* Update docs/changelog/117589.yaml

* Fixing injection issue

* Allowing model to be overridden but not working yet

* Fixing issues

* Switch field name for tool

* Add suport for toolCalls and refusal in streaming completion

* Working tool call response

* Separate unified and legacy code paths

* Updated the parser, but there are some class cast exceptions to fix

* Refactoring tests and request entities

* Parse response from OpenAI

* Removing unused request classes

* precommit

* Adding tests for UnifiedCompletionAction Request

* Refactoring stop to be a list of strings

* Testing for OpenAI response parsing

* Refactoring transport action tests to test unified validation code

* Fixing various tests

* Fixing license header

* Reformat streaming results

* Finalize response format

* remove debug logs

* remove changes for debugging

* Task type and base inference action tests

* Adding openai service tests

* Adding model tests

* tests for StreamingUnifiedChatCompletionResultsTests toXContentChunked

* Fixing change log and removing commented out code

* Switch usage to accept null

* Adding test for TestStreamingCompletionServiceExtension

* Avoid serializing empty lists + request entity tests

* Register named writeables from UnifiedCompletionRequest

* Removing commented code

* Clean up and add more of an explination

* remove duplicate test

* remove old todos

* Refactoring some duplication

* Adding javadoc

* Addressing feedback

---------

Co-authored-by: Jonathan Buttner <[email protected]>
Co-authored-by: Jonathan Buttner <[email protected]>
(cherry picked from commit 467fdb8)

# Conflicts:
#	x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceCrudIT.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceAction.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/common/DelegatingProcessor.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/action/TransportInferenceActionTests.java
maxhniebergall added a commit to maxhniebergall/elasticsearch that referenced this pull request Dec 16, 2024
* Adding some shell classes

* modeling the request objects

* Writeable changes to schema

* Working parsing tests

* Creating a new action

* Add outbound request writing (WIP)

* Improvements to request serialization

* Adding separate transport classes

* separate out unified request and combine inputs

* Reworking unified inputs

* Adding unsupported operation calls

* Fixing parsing logic

* get the build working

* Update docs/changelog/117589.yaml

* Fixing injection issue

* Allowing model to be overridden but not working yet

* Fixing issues

* Switch field name for tool

* Add suport for toolCalls and refusal in streaming completion

* Working tool call response

* Separate unified and legacy code paths

* Updated the parser, but there are some class cast exceptions to fix

* Refactoring tests and request entities

* Parse response from OpenAI

* Removing unused request classes

* precommit

* Adding tests for UnifiedCompletionAction Request

* Refactoring stop to be a list of strings

* Testing for OpenAI response parsing

* Refactoring transport action tests to test unified validation code

* Fixing various tests

* Fixing license header

* Reformat streaming results

* Finalize response format

* remove debug logs

* remove changes for debugging

* Task type and base inference action tests

* Adding openai service tests

* Adding model tests

* tests for StreamingUnifiedChatCompletionResultsTests toXContentChunked

* Fixing change log and removing commented out code

* Switch usage to accept null

* Adding test for TestStreamingCompletionServiceExtension

* Avoid serializing empty lists + request entity tests

* Register named writeables from UnifiedCompletionRequest

* Removing commented code

* Clean up and add more of an explination

* remove duplicate test

* remove old todos

* Refactoring some duplication

* Adding javadoc

* Addressing feedback

---------

Co-authored-by: Jonathan Buttner <[email protected]>
Co-authored-by: Jonathan Buttner <[email protected]>
(cherry picked from commit 467fdb8)

# Conflicts:
#	x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceCrudIT.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceAction.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/common/DelegatingProcessor.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/action/TransportInferenceActionTests.java
@maxhniebergall
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

maxhniebergall added a commit to maxhniebergall/elasticsearch that referenced this pull request Dec 16, 2024
* Adding some shell classes

* modeling the request objects

* Writeable changes to schema

* Working parsing tests

* Creating a new action

* Add outbound request writing (WIP)

* Improvements to request serialization

* Adding separate transport classes

* separate out unified request and combine inputs

* Reworking unified inputs

* Adding unsupported operation calls

* Fixing parsing logic

* get the build working

* Update docs/changelog/117589.yaml

* Fixing injection issue

* Allowing model to be overridden but not working yet

* Fixing issues

* Switch field name for tool

* Add suport for toolCalls and refusal in streaming completion

* Working tool call response

* Separate unified and legacy code paths

* Updated the parser, but there are some class cast exceptions to fix

* Refactoring tests and request entities

* Parse response from OpenAI

* Removing unused request classes

* precommit

* Adding tests for UnifiedCompletionAction Request

* Refactoring stop to be a list of strings

* Testing for OpenAI response parsing

* Refactoring transport action tests to test unified validation code

* Fixing various tests

* Fixing license header

* Reformat streaming results

* Finalize response format

* remove debug logs

* remove changes for debugging

* Task type and base inference action tests

* Adding openai service tests

* Adding model tests

* tests for StreamingUnifiedChatCompletionResultsTests toXContentChunked

* Fixing change log and removing commented out code

* Switch usage to accept null

* Adding test for TestStreamingCompletionServiceExtension

* Avoid serializing empty lists + request entity tests

* Register named writeables from UnifiedCompletionRequest

* Removing commented code

* Clean up and add more of an explination

* remove duplicate test

* remove old todos

* Refactoring some duplication

* Adding javadoc

* Addressing feedback

---------

Co-authored-by: Jonathan Buttner <[email protected]>
Co-authored-by: Jonathan Buttner <[email protected]>
(cherry picked from commit 467fdb8)

# Conflicts:
#	x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceCrudIT.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceAction.java
maxhniebergall added a commit that referenced this pull request Dec 16, 2024
…118772)

* [Inference API] Add unified api for chat completions (#117589)

* Adding some shell classes

* modeling the request objects

* Writeable changes to schema

* Working parsing tests

* Creating a new action

* Add outbound request writing (WIP)

* Improvements to request serialization

* Adding separate transport classes

* separate out unified request and combine inputs

* Reworking unified inputs

* Adding unsupported operation calls

* Fixing parsing logic

* get the build working

* Update docs/changelog/117589.yaml

* Fixing injection issue

* Allowing model to be overridden but not working yet

* Fixing issues

* Switch field name for tool

* Add suport for toolCalls and refusal in streaming completion

* Working tool call response

* Separate unified and legacy code paths

* Updated the parser, but there are some class cast exceptions to fix

* Refactoring tests and request entities

* Parse response from OpenAI

* Removing unused request classes

* precommit

* Adding tests for UnifiedCompletionAction Request

* Refactoring stop to be a list of strings

* Testing for OpenAI response parsing

* Refactoring transport action tests to test unified validation code

* Fixing various tests

* Fixing license header

* Reformat streaming results

* Finalize response format

* remove debug logs

* remove changes for debugging

* Task type and base inference action tests

* Adding openai service tests

* Adding model tests

* tests for StreamingUnifiedChatCompletionResultsTests toXContentChunked

* Fixing change log and removing commented out code

* Switch usage to accept null

* Adding test for TestStreamingCompletionServiceExtension

* Avoid serializing empty lists + request entity tests

* Register named writeables from UnifiedCompletionRequest

* Removing commented code

* Clean up and add more of an explination

* remove duplicate test

* remove old todos

* Refactoring some duplication

* Adding javadoc

* Addressing feedback

---------

Co-authored-by: Jonathan Buttner <[email protected]>
Co-authored-by: Jonathan Buttner <[email protected]>
(cherry picked from commit 467fdb8)

# Conflicts:
#	x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceCrudIT.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceAction.java

* fix merge conflicts

* formatting

* Remove tests - retain feature flag

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :ml Machine learning Team:ML Meta label for the ML team v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants