[Inference API] Add unified api for chat completions #117589

maxhniebergall · 2024-11-26T20:03:06Z

Unified API communicating with OpenAI

Testing

Running ES

The _unified route is behind a feature flag, so to enable it run es like this:

./gradlew :run -Drun.license_type=trial -Des.inference_unified_feature_flag_enabled=true

Creating endpoint and sending requestions

Creating a completion endpoint

PUT http://localhost:9200/_inference/completion/test
{
    "service": "openai",
    "service_settings": {
        "api_key": "<api key>",
        "model_id": "gpt-4o"
    }
}

Completion request

POST http://localhost:9200/_inference/completion/test/_unified
{
    "model": "gpt-4o",
    "messages": [
        {
            "role": "user",
            "content": "What is the weather like in Boston today?"
        }
    ],
    "stop": "none",
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        },
                        "unit": {
                            "type": "string",
                            "enum": [
                                "celsius",
                                "fahrenheit"
                            ]
                        }
                    },
                    "required": [
                        "location"
                    ]
                }
            }
        }
    ],
    "tool_choice": "auto"
}

Response format

A sequence of partial responses:

{
    "id": "chatcmpl-AamGq3bqfB5bhmDaBLYi5Z5kTUz5R",
    "choices": [
        {
            "delta": {
                "content": " ol"
            },
            "index": 0
        }
    ],
    "model": "gpt-3.5-turbo-0125",
    "object": "chat.completion.chunk"
}

{
    "id": "chatcmpl-AamGq3bqfB5bhmDaBLYi5Z5kTUz5R",
    "choices": [
        {
            "delta": {
                "content": "iv"
            },
            "index": 0
        }
    ],
    "model": "gpt-3.5-turbo-0125",
    "object": "chat.completion.chunk"
}

{
    "id": "chatcmpl-AamGq3bqfB5bhmDaBLYi5Z5kTUz5R",
    "choices": [
        {
            "delta": {
                "content": "ine"
            },
            "index": 0
        }
    ],
    "model": "gpt-3.5-turbo-0125",
    "object": "chat.completion.chunk"
}

{
    "id": "chatcmpl-AamGq3bqfB5bhmDaBLYi5Z5kTUz5R",
    "choices": [
        {
            "delta": {
                "content": "."
            },
            "index": 0
        }
    ],
    "model": "gpt-3.5-turbo-0125",
    "object": "chat.completion.chunk"
}

{
    "id": "chatcmpl-AamGq3bqfB5bhmDaBLYi5Z5kTUz5R",
    "choices": [
        {
            "delta": {},
            "finish_reason": "stop",
            "index": 0
        }
    ],
    "model": "gpt-3.5-turbo-0125",
    "object": "chat.completion.chunk"
}

[DONE
]

…/elasticsearch into ml-inference-unified-api-elastic

elasticsearchmachine · 2024-11-26T20:03:30Z

Pinging @elastic/ml-core (Team:ML)

elasticsearchmachine · 2024-11-26T20:03:32Z

Hi @maxhniebergall, I've created a changelog YAML for you.

…ence-unified-api-elastic

maxhniebergall · 2024-11-26T20:16:36Z

Tests to add:

UnifiedCompletionAction (@jonathan-buttner) ✅
TestStreamingCompletionServiceExtension (@jonathan-buttner) ✅
Rolling update tests (We don't think we need these?)
TransportInferenceActionTests (@jonathan-buttner) ✅
InferenceInputs (@jonathan-buttner) ✅
- ~~we should double check the castTo method and add it to the other subclasses~~ (Let's do this later because the PR is large enough)
UnifiedChatInput (@jonathan-buttner) ✅
- for the conversions
OpenAiUnifiedCompletionRequestEntity (@maxhniebergall)✅
BaseInferenceAction (@jonathan-buttner) ✅
RestUnifiedCompletionInferenceAction (@jonathan-buttner) ✅
OpenAiService (@jonathan-buttner) ✅
OpenAiChatCompletionModel (@jonathan-buttner) ✅
OpenAiUnifiedStreamingProcessor & StreamingUnifiedChatCompletionResults (@maxhniebergall )✅

TODO

Create a list of all new named writables and add them to the registry ✅
Address outstanding TODOs ✅

…/elasticsearch into ml-inference-unified-api-elastic

server/src/main/java/org/elasticsearch/inference/UnifiedCompletionRequest.java

…/elasticsearch into ml-inference-unified-api-elastic

maxhniebergall

LGTM

…/elasticsearch into ml-inference-unified-api-elastic

…ence-unified-api-elastic

jonathan-buttner · 2024-12-06T13:59:11Z

...search/xpack/inference/external/request/openai/OpenAiUnifiedChatCompletionRequestEntity.java

+
+        builder.field(MODEL_FIELD, model.getServiceSettings().modelId());
+        if (unifiedRequest.maxCompletionTokens() != null) {
+            builder.field(MAX_COMPLETION_TOKENS_FIELD, unifiedRequest.maxCompletionTokens());


I just realized that the OpenAiChatCompletionServiceSettings has a similar field that isn't used (even previous to this PR). I wonder if we should sync those fields up like we do the modelId 🤔 . I think we've typically used the max tokens to do the truncation for text embedding so I don't think we really used it for completions.

@maxhniebergall what do you think?

hmm, yea it sounds like we should use the value from the service settings if its available

davidkyle

LGTM

davidkyle · 2024-12-06T13:36:14Z

server/src/main/java/org/elasticsearch/inference/UnifiedCompletionRequest.java

+        public void toXContent(XContentBuilder builder, ToXContent.Params params) throws IOException {
+            builder.value(content);
+        }


Is function required? The class does not declare that is implements toXContent

Oops, yeah we can remove that. Thanks.

...ore/src/main/java/org/elasticsearch/xpack/core/inference/action/UnifiedCompletionAction.java

davidkyle · 2024-12-06T14:31:38Z

...ore/src/main/java/org/elasticsearch/xpack/core/inference/action/UnifiedCompletionAction.java

+                return e;
+            }
+
+            if (taskType != TaskType.COMPLETION) {


Suggested change

if (taskType != TaskType.COMPLETION) {

if (taskType.isAnyOrSame(TaskType.COMPLETION)) {

For the case where tasktype is not set in the URL and defaulted to ANY

Ah good catch!

davidkyle · 2024-12-06T15:55:11Z

.../java/org/elasticsearch/xpack/inference/external/openai/OpenAiUnifiedStreamingProcessor.java

+    private Deque<StreamingUnifiedChatCompletionResults.ChatCompletionChunk> singleItem(
+        StreamingUnifiedChatCompletionResults.ChatCompletionChunk result
+    ) {
+        var deque = new ArrayDeque<StreamingUnifiedChatCompletionResults.ChatCompletionChunk>(2);


Why size 2 and not 1?

I'm not sure, it was in the code Pat sent me for this. I also thought it was odd prwhelan@4c573ba

davidkyle · 2024-12-06T15:56:08Z

...search/xpack/inference/external/request/openai/OpenAiUnifiedChatCompletionRequestEntity.java

+            for (UnifiedCompletionRequest.Message message : unifiedRequest.messages()) {
+                builder.startObject();
+                {
+                    switch (message.content()) {


davidkyle · 2024-12-06T16:13:52Z

.../java/org/elasticsearch/xpack/inference/external/openai/OpenAiUnifiedStreamingProcessor.java

+    private final Deque<StreamingUnifiedChatCompletionResults.ChatCompletionChunk> buffer = new LinkedBlockingDeque<>();
+
+    @Override
+    protected void onRequest(long n) {


It took me a while to grok that onRequest is not part of the Flow interface. Maybe call this upstreamRequest not to confuse it with the various Flow on* methods

Yep sounds good. I still don't quite understand what all that stuff is doing haha. I'll have Pat give an overview when he gets back maybe.

…ence-unified-api-elastic

…/elasticsearch into ml-inference-unified-api-elastic

maxhniebergall · 2024-12-11T20:48:16Z

💚 All backports created successfully

Status	Branch	Result
✅	8.x

Questions ?

Please refer to the Backport tool documentation

* Adding some shell classes * modeling the request objects * Writeable changes to schema * Working parsing tests * Creating a new action * Add outbound request writing (WIP) * Improvements to request serialization * Adding separate transport classes * separate out unified request and combine inputs * Reworking unified inputs * Adding unsupported operation calls * Fixing parsing logic * get the build working * Update docs/changelog/117589.yaml * Fixing injection issue * Allowing model to be overridden but not working yet * Fixing issues * Switch field name for tool * Add suport for toolCalls and refusal in streaming completion * Working tool call response * Separate unified and legacy code paths * Updated the parser, but there are some class cast exceptions to fix * Refactoring tests and request entities * Parse response from OpenAI * Removing unused request classes * precommit * Adding tests for UnifiedCompletionAction Request * Refactoring stop to be a list of strings * Testing for OpenAI response parsing * Refactoring transport action tests to test unified validation code * Fixing various tests * Fixing license header * Reformat streaming results * Finalize response format * remove debug logs * remove changes for debugging * Task type and base inference action tests * Adding openai service tests * Adding model tests * tests for StreamingUnifiedChatCompletionResultsTests toXContentChunked * Fixing change log and removing commented out code * Switch usage to accept null * Adding test for TestStreamingCompletionServiceExtension * Avoid serializing empty lists + request entity tests * Register named writeables from UnifiedCompletionRequest * Removing commented code * Clean up and add more of an explination * remove duplicate test * remove old todos * Refactoring some duplication * Adding javadoc * Addressing feedback --------- Co-authored-by: Jonathan Buttner <[email protected]> Co-authored-by: Jonathan Buttner <[email protected]> (cherry picked from commit 467fdb8) # Conflicts: # x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceCrudIT.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceAction.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/common/DelegatingProcessor.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/action/TransportInferenceActionTests.java

maxhniebergall · 2024-12-16T14:22:27Z

💚 All backports created successfully

Status	Branch	Result
✅	8.x

Questions ?

Please refer to the Backport tool documentation

* Adding some shell classes * modeling the request objects * Writeable changes to schema * Working parsing tests * Creating a new action * Add outbound request writing (WIP) * Improvements to request serialization * Adding separate transport classes * separate out unified request and combine inputs * Reworking unified inputs * Adding unsupported operation calls * Fixing parsing logic * get the build working * Update docs/changelog/117589.yaml * Fixing injection issue * Allowing model to be overridden but not working yet * Fixing issues * Switch field name for tool * Add suport for toolCalls and refusal in streaming completion * Working tool call response * Separate unified and legacy code paths * Updated the parser, but there are some class cast exceptions to fix * Refactoring tests and request entities * Parse response from OpenAI * Removing unused request classes * precommit * Adding tests for UnifiedCompletionAction Request * Refactoring stop to be a list of strings * Testing for OpenAI response parsing * Refactoring transport action tests to test unified validation code * Fixing various tests * Fixing license header * Reformat streaming results * Finalize response format * remove debug logs * remove changes for debugging * Task type and base inference action tests * Adding openai service tests * Adding model tests * tests for StreamingUnifiedChatCompletionResultsTests toXContentChunked * Fixing change log and removing commented out code * Switch usage to accept null * Adding test for TestStreamingCompletionServiceExtension * Avoid serializing empty lists + request entity tests * Register named writeables from UnifiedCompletionRequest * Removing commented code * Clean up and add more of an explination * remove duplicate test * remove old todos * Refactoring some duplication * Adding javadoc * Addressing feedback --------- Co-authored-by: Jonathan Buttner <[email protected]> Co-authored-by: Jonathan Buttner <[email protected]> (cherry picked from commit 467fdb8) # Conflicts: # x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceCrudIT.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceAction.java

…118772) * [Inference API] Add unified api for chat completions (#117589) * Adding some shell classes * modeling the request objects * Writeable changes to schema * Working parsing tests * Creating a new action * Add outbound request writing (WIP) * Improvements to request serialization * Adding separate transport classes * separate out unified request and combine inputs * Reworking unified inputs * Adding unsupported operation calls * Fixing parsing logic * get the build working * Update docs/changelog/117589.yaml * Fixing injection issue * Allowing model to be overridden but not working yet * Fixing issues * Switch field name for tool * Add suport for toolCalls and refusal in streaming completion * Working tool call response * Separate unified and legacy code paths * Updated the parser, but there are some class cast exceptions to fix * Refactoring tests and request entities * Parse response from OpenAI * Removing unused request classes * precommit * Adding tests for UnifiedCompletionAction Request * Refactoring stop to be a list of strings * Testing for OpenAI response parsing * Refactoring transport action tests to test unified validation code * Fixing various tests * Fixing license header * Reformat streaming results * Finalize response format * remove debug logs * remove changes for debugging * Task type and base inference action tests * Adding openai service tests * Adding model tests * tests for StreamingUnifiedChatCompletionResultsTests toXContentChunked * Fixing change log and removing commented out code * Switch usage to accept null * Adding test for TestStreamingCompletionServiceExtension * Avoid serializing empty lists + request entity tests * Register named writeables from UnifiedCompletionRequest * Removing commented code * Clean up and add more of an explination * remove duplicate test * remove old todos * Refactoring some duplication * Adding javadoc * Addressing feedback --------- Co-authored-by: Jonathan Buttner <[email protected]> Co-authored-by: Jonathan Buttner <[email protected]> (cherry picked from commit 467fdb8) # Conflicts: # x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceCrudIT.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceAction.java * fix merge conflicts * formatting * Remove tests - retain feature flag * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <[email protected]>

jonathan-buttner and others added 14 commits November 20, 2024 17:24

Adding some shell classes

bf507e7

modeling the request objects

705aa42

Writeable changes to schema

bd5df97

Working parsing tests

bd59543

Creating a new action

1e30c6d

Add outbound request writing (WIP)

2846942

Improvements to request serialization

9cb401c

Adding separate transport classes

1e0eb20

separate out unified request and combine inputs

d6cc223

Merge branch 'ml-inference-unified-api-elastic' of github.com.:elastic…

7986c81

…/elasticsearch into ml-inference-unified-api-elastic

Reworking unified inputs

bf817d0

Adding unsupported operation calls

81a05b7

Fixing parsing logic

cb440e1

get the build working

86d477e

maxhniebergall added >enhancement :ml Machine learning v9.0.0 v8.18.0 labels Nov 26, 2024

elasticsearchmachine added the Team:ML Meta label for the ML team label Nov 26, 2024

Update docs/changelog/117589.yaml

359d305

Merge branch 'main' of github.com.:elastic/elasticsearch into ml-infer…

4070231

…ence-unified-api-elastic

jonathan-buttner added 3 commits November 26, 2024 15:17

Merge branch 'ml-inference-unified-api-elastic' of github.com.:elastic…

ce57bea

…/elasticsearch into ml-inference-unified-api-elastic

Fixing injection issue

834676d

Allowing model to be overridden but not working yet

5909a7d

maxhniebergall commented Nov 27, 2024

View reviewed changes

server/src/main/java/org/elasticsearch/inference/UnifiedCompletionRequest.java Outdated Show resolved Hide resolved

jonathan-buttner added 2 commits November 27, 2024 10:02

Fixing issues

315be2c

Switch field name for tool

657561e

Merge branch 'ml-inference-unified-api-elastic' of github.com.:elastic…

e2ed5cc

…/elasticsearch into ml-inference-unified-api-elastic

maxhniebergall commented Dec 5, 2024

View reviewed changes

jonathan-buttner added 3 commits December 5, 2024 15:22

Refactoring some duplication

8f22f56

Adding javadoc

a9b44b5

Merge branch 'ml-inference-unified-api-elastic' of github.com.:elastic…

fc173ff

…/elasticsearch into ml-inference-unified-api-elastic

jonathan-buttner requested a review from davidkyle December 5, 2024 20:37

jonathan-buttner and others added 2 commits December 5, 2024 15:40

Merge branch 'main' of github.com.:elastic/elasticsearch into ml-infer…

4c2573e

…ence-unified-api-elastic

Merge branch 'main' into ml-inference-unified-api-elastic

e1decca

jonathan-buttner reviewed Dec 6, 2024

View reviewed changes

davidkyle approved these changes Dec 6, 2024

View reviewed changes

jonathan-buttner added 3 commits December 6, 2024 14:09

Addressing feedback

3c4428f

Merge branch 'main' of github.com.:elastic/elasticsearch into ml-infer…

b16008f

…ence-unified-api-elastic

Merge branch 'ml-inference-unified-api-elastic' of github.com.:elastic…

481aa90

…/elasticsearch into ml-inference-unified-api-elastic

jonathan-buttner enabled auto-merge (squash) December 6, 2024 19:16

Removing unused import

7fc36ce

jonathan-buttner merged commit 467fdb8 into main Dec 6, 2024
17 of 18 checks passed

jonathan-buttner deleted the ml-inference-unified-api-elastic branch December 6, 2024 20:52

maxhniebergall mentioned this pull request Dec 11, 2024

[8.x] [Inference API] Add unified api for chat completions (#117589) #118506

Closed

maxhniebergall mentioned this pull request Dec 16, 2024

[8.x] [Inference API] Add unified api for chat completions (#117589) #118772

Merged

jonathan-buttner mentioned this pull request Dec 16, 2024

Add POST _unified for the inference API elastic/elasticsearch-specification#3313

Merged

YulNaumenko mentioned this pull request Jan 17, 2025

[Epic] Enabling inference AI Connector as a default experience for all Kibana GenAI functionality elastic/kibana#207140

Open

7 tasks

pquentin mentioned this pull request Jan 20, 2025

Add rest-api-spec for unified inference API #120447

Merged

jonathan-buttner mentioned this pull request Feb 5, 2025

[REQUEST]: Inference API removing references to the _unified URL suffix elastic/docs-content#339

Closed

alvarezmelissa87 mentioned this pull request Apr 2, 2025

[ML][AI Connector] Add support for unified completion spec elastic/kibana#216942

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inference API] Add unified api for chat completions #117589

[Inference API] Add unified api for chat completions #117589

maxhniebergall commented Nov 26, 2024 •

edited by jonathan-buttner

Loading

elasticsearchmachine commented Nov 26, 2024

elasticsearchmachine commented Nov 26, 2024

maxhniebergall commented Nov 26, 2024 •

edited by jonathan-buttner

Loading

maxhniebergall left a comment

jonathan-buttner Dec 6, 2024

maxhniebergall Dec 6, 2024

davidkyle left a comment

davidkyle Dec 6, 2024

jonathan-buttner Dec 6, 2024

davidkyle Dec 6, 2024

jonathan-buttner Dec 6, 2024

davidkyle Dec 6, 2024

maxhniebergall Dec 6, 2024

davidkyle Dec 6, 2024

davidkyle Dec 6, 2024

jonathan-buttner Dec 6, 2024

maxhniebergall commented Dec 11, 2024

maxhniebergall commented Dec 16, 2024

	if (taskType != TaskType.COMPLETION) {
	if (taskType.isAnyOrSame(TaskType.COMPLETION)) {

[Inference API] Add unified api for chat completions #117589

[Inference API] Add unified api for chat completions #117589

Conversation

maxhniebergall commented Nov 26, 2024 • edited by jonathan-buttner Loading

Testing

Running ES

Creating endpoint and sending requestions

Response format

elasticsearchmachine commented Nov 26, 2024

elasticsearchmachine commented Nov 26, 2024

maxhniebergall commented Nov 26, 2024 • edited by jonathan-buttner Loading

maxhniebergall left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidkyle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxhniebergall commented Dec 11, 2024

💚 All backports created successfully

Questions ?

maxhniebergall commented Dec 16, 2024

💚 All backports created successfully

Questions ?

maxhniebergall commented Nov 26, 2024 •

edited by jonathan-buttner

Loading

maxhniebergall commented Nov 26, 2024 •

edited by jonathan-buttner

Loading