[ML] Adding docs for the unified inference API #118696

jonathan-buttner · 2024-12-13T19:59:03Z

This PR adds docs for the new unified inference API. I tried to cover the more complicated objects with examples.

This PR is waiting on the addition of the chat_completion task type in this PR: #119982

https://elasticsearch_bk_118696.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/chat-completion-inference-api.html

github-actions · 2024-12-13T19:59:16Z

Documentation preview:

✨ Changed pages

jonathan-buttner · 2024-12-13T19:59:45Z

docs/reference/inference/inference-shared.asciidoc

@@ -63,4 +63,44 @@ Specifies the chunking strategy.
 It could be either `sentence` or `word`.
 end::chunking-settings-strategy[]

+tag::unified-schema-content-with-examples[]


I import this a couple times for each of the messages since they all have the same format (string or an array of objects).

jonathan-buttner · 2024-12-13T20:00:30Z

docs/reference/inference/inference-shared.asciidoc

+The text content.
+
+Object representation:::
+`text`::::


We get lucky that each of the messages has content:: so the colons work out here to nest it correctly.

jonathan-buttner · 2024-12-13T20:02:09Z

docs/reference/inference/unified-inference.asciidoc

+(Optional, array of objects)
+A list of tools that the model can call.
+
+.Structure


I'm open to other names here? Maybe Format?

jonathan-buttner · 2024-12-13T20:03:04Z

docs/reference/inference/unified-inference.asciidoc

+(Required unless `tool_calls` is specified, string or array of objects)
+The contents of the message.
+
+include::inference-shared.asciidoc[tag=unified-schema-content-with-examples]


I personally find it helpful if examples are sprinkled throughout the docs as we're reading instead of just include complete ones at the bottom. I'm open to other suggestions here.

This is probably a larger issue than this PR, but theres a bug with the example expansion where the arrow always points down , but only for the inner expansion box, the outer one is correc.t

Thanks for pointing that out. I made the docs team aware.

jonathan-buttner · 2024-12-13T20:04:15Z

docs/reference/inference/unified-inference.asciidoc

+------------------------------------------------------------
+// TEST[skip:TBD]
+
+<1> Each tool call needs a corresponding Tool message.


Maybe this is too OpenAI specific to include?

jonathan-buttner · 2024-12-13T21:19:55Z

@elasticmachine run elasticsearch-ci/docs

elasticsearchmachine · 2024-12-13T21:22:26Z

Pinging @elastic/ml-core (Team:ML)

elasticsearchmachine · 2024-12-13T21:22:27Z

Pinging @elastic/es-docs (Team:Docs)

jonathan-buttner · 2024-12-13T21:24:12Z

@elasticmachine run elasticsearch-ci/docs

maxhniebergall

LGTM. Thanks for writing this Jonathan! I think these docs look great. I just left a couple small comments which could be improvements.

maxhniebergall · 2024-12-17T16:00:27Z

docs/reference/inference/unified-inference.asciidoc

+==== {api-request-body-title}
+
+`messages`::
+(Required, array of objects) A list of objects representing the conversation.


What do you think about adding some information about how these messages lists should be generated? Something like

"Requests should generally only add new messages from the user. The other messages ("assistent", "system", or "tool") should generally only be copy-pasted from the response to a previous completion request, such that the messages array is built up over the course of a conversation."

maxhniebergall · 2024-12-17T16:02:37Z

docs/reference/inference/unified-inference.asciidoc

+
+`model`::
+(Optional, string)
+The ID of the model to use.


Suggested change

The ID of the model to use.

The ID of the model to use. By default, the model ID set in the inference endpoint is used.

szabosteve

I left a few small comments, please take or leave them. LGTM!

szabosteve · 2025-01-13T12:48:58Z

docs/reference/inference/chat-completion-inference.asciidoc

+==== {api-description-title}
+
+The chat completion {infer} API enables real-time responses for chat completion tasks by delivering answers incrementally, reducing response times during computation.
+It only works with the `chat_completion` task type for OpenAI and Elastic Inference Service.


Suggested change

It only works with the `chat_completion` task type for OpenAI and Elastic Inference Service.

It only works with the `chat_completion` task type for the `openai` and `elasticsearch` {infer} services.

Good idea, one heads up the Elastic Inference Service (EIS) is actually different from the Elasticsearch service. EIS is called elastic and Elasticsearch is called elasticsearch. Also we don't have docs for EIS yet but we need them.

docs/reference/inference/chat-completion-inference.asciidoc

szabosteve · 2025-01-13T12:56:54Z

docs/reference/inference/put-inference.asciidoc

@@ -67,10 +67,10 @@ Click the links to review the configuration details of the services:
 * <<infer-service-elasticsearch,Elasticsearch>> (`rerank`, `sparse_embedding`, `text_embedding` - this service is for built-in models and models uploaded through Eland)


Suggested change

* <<infer-service-elasticsearch,Elasticsearch>> (`rerank`, `sparse_embedding`, `text_embedding` - this service is for built-in models and models uploaded through Eland)

* <<infer-service-elasticsearch,Elasticsearch>> (`chat_completion`, `rerank`, `sparse_embedding`, `text_embedding` - this service is for built-in models and models uploaded through Eland)

As we chatted about offline, the elasticsearch service is different from the elastic service. So this code will actually stay the same. We'll need to add a new entry for the elastic service.

…ed-api-docs

Co-authored-by: István Zoltán Szabó <[email protected]>

…sticsearch into ml-unified-api-docs

* Including examples * Using js instead of json * Adding unified docs to main page * Adding missing description text * Refactoring to remove unified route * Addign back references to the _unified route * Update docs/reference/inference/chat-completion-inference.asciidoc Co-authored-by: István Zoltán Szabó <[email protected]> * Address feedback --------- Co-authored-by: István Zoltán Szabó <[email protected]>

elasticsearchmachine · 2025-01-13T14:50:13Z

💚 Backport successful

Status	Branch	Result
✅	8.x

* Including examples * Using js instead of json * Adding unified docs to main page * Adding missing description text * Refactoring to remove unified route * Addign back references to the _unified route * Update docs/reference/inference/chat-completion-inference.asciidoc * Address feedback --------- Co-authored-by: István Zoltán Szabó <[email protected]>

* Including examples * Using js instead of json * Adding unified docs to main page * Adding missing description text * Refactoring to remove unified route * Addign back references to the _unified route * Update docs/reference/inference/chat-completion-inference.asciidoc Co-authored-by: István Zoltán Szabó <[email protected]> * Address feedback --------- Co-authored-by: István Zoltán Szabó <[email protected]>

Including examples

603fd58

jonathan-buttner added >docs General docs changes :ml Machine learning Team:ML Meta label for the ML team v9.0.0 v8.18.0 labels Dec 13, 2024

jonathan-buttner commented Dec 13, 2024

View reviewed changes

Using js instead of json

16d0f62

jonathan-buttner marked this pull request as ready for review December 13, 2024 21:22

elasticsearchmachine added the Team:Docs Meta label for docs team label Dec 13, 2024

jonathan-buttner added the auto-backport Automatically create backport pull requests when merged label Dec 13, 2024

jonathan-buttner added 2 commits December 16, 2024 09:48

Adding unified docs to main page

45b8fa4

Adding missing description text

6e3373f

maxhniebergall approved these changes Dec 17, 2024

View reviewed changes

jonathan-buttner added 2 commits January 8, 2025 13:08

Refactoring to remove unified route

5621112

Addign back references to the _unified route

28f581b

jonathan-buttner requested review from szabosteve and davidkyle January 10, 2025 19:10

szabosteve approved these changes Jan 13, 2025

View reviewed changes

jonathan-buttner and others added 4 commits January 13, 2025 08:51

Merge branch 'main' of github.com.:elastic/elasticsearch into ml-unifi…

9a26298

…ed-api-docs

Update docs/reference/inference/chat-completion-inference.asciidoc

6f368a2

Co-authored-by: István Zoltán Szabó <[email protected]>

Merge branch 'ml-unified-api-docs' of github.com.:jonathan-buttner/ela…

0246ce6

…sticsearch into ml-unified-api-docs

Address feedback

263eb08

jonathan-buttner merged commit 838a41a into elastic:main Jan 13, 2025
5 checks passed

jonathan-buttner deleted the ml-unified-api-docs branch January 13, 2025 14:48

jonathan-buttner mentioned this pull request Jan 13, 2025

[8.x] [ML] Adding docs for the unified inference API (#118696) #120057

Merged

jonathan-buttner mentioned this pull request Feb 5, 2025

[REQUEST]: Inference API removing references to the _unified URL suffix elastic/docs-content#339

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Adding docs for the unified inference API #118696

[ML] Adding docs for the unified inference API #118696

jonathan-buttner commented Dec 13, 2024 •

edited

Loading

github-actions bot commented Dec 13, 2024

jonathan-buttner Dec 13, 2024

jonathan-buttner Dec 13, 2024

jonathan-buttner Dec 13, 2024

jonathan-buttner Dec 13, 2024

maxhniebergall Dec 16, 2024

jonathan-buttner Dec 17, 2024

jonathan-buttner Dec 13, 2024

jonathan-buttner commented Dec 13, 2024

elasticsearchmachine commented Dec 13, 2024

elasticsearchmachine commented Dec 13, 2024

jonathan-buttner commented Dec 13, 2024

maxhniebergall left a comment

maxhniebergall Dec 17, 2024

maxhniebergall Dec 17, 2024

szabosteve left a comment

szabosteve Jan 13, 2025

jonathan-buttner Jan 13, 2025

szabosteve Jan 13, 2025

jonathan-buttner Jan 13, 2025

elasticsearchmachine commented Jan 13, 2025

	The ID of the model to use.
	The ID of the model to use. By default, the model ID set in the inference endpoint is used.

	It only works with the `chat_completion` task type for OpenAI and Elastic Inference Service.
	It only works with the `chat_completion` task type for the `openai` and `elasticsearch` {infer} services.

		@@ -67,10 +67,10 @@ Click the links to review the configuration details of the services:
		* <<infer-service-elasticsearch,Elasticsearch>> (`rerank`, `sparse_embedding`, `text_embedding` - this service is for built-in models and models uploaded through Eland)

[ML] Adding docs for the unified inference API #118696

[ML] Adding docs for the unified inference API #118696

Conversation

jonathan-buttner commented Dec 13, 2024 • edited Loading

github-actions bot commented Dec 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonathan-buttner commented Dec 13, 2024

elasticsearchmachine commented Dec 13, 2024

elasticsearchmachine commented Dec 13, 2024

jonathan-buttner commented Dec 13, 2024

maxhniebergall left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szabosteve left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticsearchmachine commented Jan 13, 2025

💚 Backport successful

jonathan-buttner commented Dec 13, 2024 •

edited

Loading