-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[ML] Adding docs for the unified inference API #118696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Adding docs for the unified inference API #118696
Conversation
Documentation preview: |
@@ -63,4 +63,44 @@ Specifies the chunking strategy. | |||
It could be either `sentence` or `word`. | |||
end::chunking-settings-strategy[] | |||
|
|||
tag::unified-schema-content-with-examples[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I import this a couple times for each of the messages since they all have the same format (string or an array of objects).
The text content. | ||
+ | ||
Object representation::: | ||
`text`:::: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We get lucky that each of the messages has content::
so the colons work out here to nest it correctly.
(Optional, array of objects) | ||
A list of tools that the model can call. | ||
+ | ||
.Structure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to other names here? Maybe Format
?
(Required unless `tool_calls` is specified, string or array of objects) | ||
The contents of the message. | ||
+ | ||
include::inference-shared.asciidoc[tag=unified-schema-content-with-examples] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally find it helpful if examples are sprinkled throughout the docs as we're reading instead of just include complete ones at the bottom. I'm open to other suggestions here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out. I made the docs team aware.
------------------------------------------------------------ | ||
// TEST[skip:TBD] | ||
|
||
<1> Each tool call needs a corresponding Tool message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is too OpenAI specific to include?
@elasticmachine run elasticsearch-ci/docs |
Pinging @elastic/ml-core (Team:ML) |
Pinging @elastic/es-docs (Team:Docs) |
@elasticmachine run elasticsearch-ci/docs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for writing this Jonathan! I think these docs look great. I just left a couple small comments which could be improvements.
==== {api-request-body-title} | ||
|
||
`messages`:: | ||
(Required, array of objects) A list of objects representing the conversation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about adding some information about how these messages lists should be generated? Something like
"Requests should generally only add new messages from the user. The other messages ("assistent", "system", or "tool") should generally only be copy-pasted from the response to a previous completion request, such that the messages array is built up over the course of a conversation."
|
||
`model`:: | ||
(Optional, string) | ||
The ID of the model to use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ID of the model to use. | |
The ID of the model to use. By default, the model ID set in the inference endpoint is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few small comments, please take or leave them. LGTM!
==== {api-description-title} | ||
|
||
The chat completion {infer} API enables real-time responses for chat completion tasks by delivering answers incrementally, reducing response times during computation. | ||
It only works with the `chat_completion` task type for OpenAI and Elastic Inference Service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It only works with the `chat_completion` task type for OpenAI and Elastic Inference Service. | |
It only works with the `chat_completion` task type for the `openai` and `elasticsearch` {infer} services. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, one heads up the Elastic Inference Service
(EIS) is actually different from the Elasticsearch service. EIS is called elastic
and Elasticsearch is called elasticsearch
. Also we don't have docs for EIS yet but we need them.
@@ -67,10 +67,10 @@ Click the links to review the configuration details of the services: | |||
* <<infer-service-elasticsearch,Elasticsearch>> (`rerank`, `sparse_embedding`, `text_embedding` - this service is for built-in models and models uploaded through Eland) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* <<infer-service-elasticsearch,Elasticsearch>> (`rerank`, `sparse_embedding`, `text_embedding` - this service is for built-in models and models uploaded through Eland) | |
* <<infer-service-elasticsearch,Elasticsearch>> (`chat_completion`, `rerank`, `sparse_embedding`, `text_embedding` - this service is for built-in models and models uploaded through Eland) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we chatted about offline, the elasticsearch
service is different from the elastic
service. So this code will actually stay the same. We'll need to add a new entry for the elastic
service.
* Including examples * Using js instead of json * Adding unified docs to main page * Adding missing description text * Refactoring to remove unified route * Addign back references to the _unified route * Update docs/reference/inference/chat-completion-inference.asciidoc Co-authored-by: István Zoltán Szabó <[email protected]> * Address feedback --------- Co-authored-by: István Zoltán Szabó <[email protected]>
💚 Backport successful
|
* Including examples * Using js instead of json * Adding unified docs to main page * Adding missing description text * Refactoring to remove unified route * Addign back references to the _unified route * Update docs/reference/inference/chat-completion-inference.asciidoc * Address feedback --------- Co-authored-by: István Zoltán Szabó <[email protected]>
* Including examples * Using js instead of json * Adding unified docs to main page * Adding missing description text * Refactoring to remove unified route * Addign back references to the _unified route * Update docs/reference/inference/chat-completion-inference.asciidoc Co-authored-by: István Zoltán Szabó <[email protected]> * Address feedback --------- Co-authored-by: István Zoltán Szabó <[email protected]>
This PR adds docs for the new unified inference API. I tried to cover the more complicated objects with examples.
This PR is waiting on the addition of the
chat_completion
task type in this PR: #119982https://elasticsearch_bk_118696.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/chat-completion-inference-api.html