elastic · lcawl · Mar 6, 2025 · Mar 5, 2025 · Mar 6, 2025
diff --git a/solutions/search/inference-api/chat-completion-inference-api.md b/solutions/search/inference-api/chat-completion-inference-api.md
@@ -23,9 +23,9 @@ The {{infer}} APIs enable you to use certain services, such as built-in {{ml}} m
 
 ## {{api-request-title}} [chat-completion-inference-api-request] 
 
-`POST /_inference/<inference_id>/_unified`
+`POST /_inference/<inference_id>/_stream`
 
-`POST /_inference/chat_completion/<inference_id>/_unified`
+`POST /_inference/chat_completion/<inference_id>/_stream`
 
 
 ## {{api-prereq-title}} [chat-completion-inference-api-prereqs] 
@@ -38,8 +38,8 @@ The {{infer}} APIs enable you to use certain services, such as built-in {{ml}} m
 
 The chat completion {{infer}} API enables real-time responses for chat completion tasks by delivering answers incrementally, reducing response times during computation. It only works with the `chat_completion` task type for `openai` and `elastic` {{infer}} services.
 
-::::{note} 
-* The `chat_completion` task type is only available within the _unified API and only supports streaming.
+::::{note}
+* The `chat_completion` task type is only available within the `_stream` API and only supports streaming.
 * The Chat completion {{infer}} API and the Stream {{infer}} API differ in their response structure and capabilities. The Chat completion {{infer}} API provides more comprehensive customization options through more fields and function calling support. If you use the `openai` service or the `elastic` service, use the Chat completion {{infer}} API.
 
 ::::

diff --git a/solutions/search/inference-api/elastic-inference-service-eis.md b/solutions/search/inference-api/elastic-inference-service-eis.md
@@ -36,7 +36,7 @@ Creates an {{infer}} endpoint to perform an {{infer}} task with the `elastic` se
 
 
 ::::{note} 
-The `chat_completion` task type only supports streaming and only through the `_unified` API.
+The `chat_completion` task type only supports streaming and only through the `_stream` API.
 
 For more information on how to use the `chat_completion` task type, please refer to the [chat completion documentation](/solutions/search/inference-api/chat-completion-inference-api.md).
 

diff --git a/solutions/search/inference-api/openai-inference-integration.md b/solutions/search/inference-api/openai-inference-integration.md
@@ -37,7 +37,7 @@ Creates an {{infer}} endpoint to perform an {{infer}} task with the `openai` ser
 
 
 ::::{note}
-The `chat_completion` task type only supports streaming and only through the `_unified` API.
+The `chat_completion` task type only supports streaming and only through the `_stream` API.
 
 For more information on how to use the `chat_completion` task type, please refer to the [chat completion documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/chat-completion-inference-api.html).
 

diff --git a/solutions/search/semantic-search/cohere-es.md b/solutions/search/semantic-search/cohere-es.md
@@ -258,7 +258,7 @@ Rerank the results using the new {{infer}} endpoint.
 
 ```py
 # Pass the query and the search results to the service
-response = client.inference.inference(
+response = client.inference.rerank(
     inference_id="cohere_rerank",
     body={
         "query": query,