diff --git a/solutions/search/inference-api/chat-completion-inference-api.md b/solutions/search/inference-api/chat-completion-inference-api.md index 97ff911df..1369efee4 100644 --- a/solutions/search/inference-api/chat-completion-inference-api.md +++ b/solutions/search/inference-api/chat-completion-inference-api.md @@ -23,9 +23,9 @@ The {{infer}} APIs enable you to use certain services, such as built-in {{ml}} m ## {{api-request-title}} [chat-completion-inference-api-request] -`POST /_inference//_unified` +`POST /_inference//_stream` -`POST /_inference/chat_completion//_unified` +`POST /_inference/chat_completion//_stream` ## {{api-prereq-title}} [chat-completion-inference-api-prereqs] @@ -38,8 +38,8 @@ The {{infer}} APIs enable you to use certain services, such as built-in {{ml}} m The chat completion {{infer}} API enables real-time responses for chat completion tasks by delivering answers incrementally, reducing response times during computation. It only works with the `chat_completion` task type for `openai` and `elastic` {{infer}} services. -::::{note} -* The `chat_completion` task type is only available within the _unified API and only supports streaming. +::::{note} +* The `chat_completion` task type is only available within the `_stream` API and only supports streaming. * The Chat completion {{infer}} API and the Stream {{infer}} API differ in their response structure and capabilities. The Chat completion {{infer}} API provides more comprehensive customization options through more fields and function calling support. If you use the `openai` service or the `elastic` service, use the Chat completion {{infer}} API. :::: diff --git a/solutions/search/inference-api/elastic-inference-service-eis.md b/solutions/search/inference-api/elastic-inference-service-eis.md index 7b337615a..3eaf36f17 100644 --- a/solutions/search/inference-api/elastic-inference-service-eis.md +++ b/solutions/search/inference-api/elastic-inference-service-eis.md @@ -36,7 +36,7 @@ Creates an {{infer}} endpoint to perform an {{infer}} task with the `elastic` se ::::{note} -The `chat_completion` task type only supports streaming and only through the `_unified` API. +The `chat_completion` task type only supports streaming and only through the `_stream` API. For more information on how to use the `chat_completion` task type, please refer to the [chat completion documentation](/solutions/search/inference-api/chat-completion-inference-api.md). diff --git a/solutions/search/inference-api/openai-inference-integration.md b/solutions/search/inference-api/openai-inference-integration.md index 4bdefa8a8..712922678 100644 --- a/solutions/search/inference-api/openai-inference-integration.md +++ b/solutions/search/inference-api/openai-inference-integration.md @@ -37,7 +37,7 @@ Creates an {{infer}} endpoint to perform an {{infer}} task with the `openai` ser ::::{note} -The `chat_completion` task type only supports streaming and only through the `_unified` API. +The `chat_completion` task type only supports streaming and only through the `_stream` API. For more information on how to use the `chat_completion` task type, please refer to the [chat completion documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/chat-completion-inference-api.html). diff --git a/solutions/search/semantic-search/cohere-es.md b/solutions/search/semantic-search/cohere-es.md index 2fda0c058..c88b0010f 100644 --- a/solutions/search/semantic-search/cohere-es.md +++ b/solutions/search/semantic-search/cohere-es.md @@ -258,7 +258,7 @@ Rerank the results using the new {{infer}} endpoint. ```py # Pass the query and the search results to the service -response = client.inference.inference( +response = client.inference.rerank( inference_id="cohere_rerank", body={ "query": query,