[8.x] [Inference API] Add unified api for chat completions (#117589) #118506

maxhniebergall · 2024-12-11T20:48:14Z

Backport

This will backport the following commits from main to 8.x:

[Inference API] Add unified api for chat completions (#117589)

Questions ?

Please refer to the Backport tool documentation

* Adding some shell classes * modeling the request objects * Writeable changes to schema * Working parsing tests * Creating a new action * Add outbound request writing (WIP) * Improvements to request serialization * Adding separate transport classes * separate out unified request and combine inputs * Reworking unified inputs * Adding unsupported operation calls * Fixing parsing logic * get the build working * Update docs/changelog/117589.yaml * Fixing injection issue * Allowing model to be overridden but not working yet * Fixing issues * Switch field name for tool * Add suport for toolCalls and refusal in streaming completion * Working tool call response * Separate unified and legacy code paths * Updated the parser, but there are some class cast exceptions to fix * Refactoring tests and request entities * Parse response from OpenAI * Removing unused request classes * precommit * Adding tests for UnifiedCompletionAction Request * Refactoring stop to be a list of strings * Testing for OpenAI response parsing * Refactoring transport action tests to test unified validation code * Fixing various tests * Fixing license header * Reformat streaming results * Finalize response format * remove debug logs * remove changes for debugging * Task type and base inference action tests * Adding openai service tests * Adding model tests * tests for StreamingUnifiedChatCompletionResultsTests toXContentChunked * Fixing change log and removing commented out code * Switch usage to accept null * Adding test for TestStreamingCompletionServiceExtension * Avoid serializing empty lists + request entity tests * Register named writeables from UnifiedCompletionRequest * Removing commented code * Clean up and add more of an explination * remove duplicate test * remove old todos * Refactoring some duplication * Adding javadoc * Addressing feedback --------- Co-authored-by: Jonathan Buttner <[email protected]> Co-authored-by: Jonathan Buttner <[email protected]> (cherry picked from commit 467fdb8) # Conflicts: # x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceCrudIT.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceAction.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/common/DelegatingProcessor.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/action/TransportInferenceActionTests.java

…e-usage fixing switch with class issue

…ic#118487) (elastic#118503) * Correcting the index version filter in migration reindex logic (elastic#118487) * fixing the version for 8.x

…#118522) Relates ES-10234

) (elastic#118526) CCS test coverage for elastic#118378

elastic#118070) (elastic#118528)

…elastic#118529) * Add one test for plugin type to PluginsLoaderTests * Suppress ExtraFs (or PluginsUtils etc could fail with extra0 files)

…18455) Fixes: elastic#118311 elastic#118310 elastic#118309 Same issue that was fixed in: elastic#110963 `@BeforeClass` is executed after the test rules. This means it creates the clusters for all the invalid versions, which sometimes doesnt work. Change it to a rule which definitely evaluates before the clusters are created. This will also speed up this test in CI.

…118428) (elastic#118532) * Disable check_on_startup for KibanaUserRoleIntegTests (elastic#118428) (cherry picked from commit c30ba12) # Conflicts: # muted-tests.yml * fixup! Unmute test

Co-authored-by: Liam Thompson <[email protected]>

…lastic#118533) * Handle all exceptions in data nodes can match (elastic#117469) During the can match phase, prior to the query phase, we may have exceptions that are returned back to the coordinating node, handled gracefully as if the shard returned canMatch=true. During the query phase, we perform an additional rewrite and can match phase to eventually shortcut the query phase for the shard. That needs to handle exceptions as well. Currently, an exception there causes shard failures, while we should rather go ahead and execute the query on the shard. Instead of adding another try catch on consumers code, this commit adds exception handling to the method itself so that it can no longer throw exceptions and similar mistakes can no longer be made in the future. At the same time, this commit makes the can match method more easily testable without requiring a full-blown SearchService instance. Closes elastic#104994 * fix compile

…oice. (elastic#118530) (elastic#118537)

…elastic#117840) (elastic#118540)

… with the inference API page (elastic#118536) (elastic#118546)

…positoryAnalysisRestIT org.elasticsearch.repositories.blobstore.testkit.analyze.MinioRepositoryAnalysisRestIT elastic#118548

…astic#118541) Verifies that the plugin sets the `aws_availability_zone` automatically by reading the AZ name from the IMDS at startup.

…ic#118331) (elastic#118550) SmbTestContainer base image upgraded from Ubuntu 16.04 to 24.04 to avoid hanging Python module compilation when installing samba package. Installing SMB had to be moved from container building to starting because SYS_ADMIN capability is required. (cherry picked from commit a0f64d2) # Conflicts: # .buildkite/pipelines/pull-request/packaging-tests-unix.yml

elastic#117245) (elastic#118698) We don't need to use this request, the handler for freeing of scroll requests literally goes to the same transport handler and doesn't come with the list of indices. The original security need for keeping the list of indices around is long gone.

…penAiEmbeddings {upgradedNodes=1} elastic#118156

…testHFEmbeddings {upgradedNodes=1} elastic#118197

…penAiCompletions {upgradedNodes=2} elastic#118163

…penAiCompletions {upgradedNodes=1} elastic#118162

…stic#118700) * [ML] Inference duration and error metrics (elastic#115876) Add `es.inference.requests.time` metric around `infer` API. As recommended by OTel spec, errors are determined by the presence or absence of the `error.type` attribute in the metric. "error.type" will be the http status code (as a string) if it is available, otherwise it will be the name of the exception (e.g. NullPointerException). Additional notes: - ApmInferenceStats is merged into InferenceStats. Originally we planned to have multiple implementations, but now we're only using APM. - Request count is now always recorded, even when there are failures loading the endpoint configuration. - Added a hook in streaming for cancel messages, so we can close the metrics when a user cancels the stream. (cherry picked from commit 26870ef) * fixing switch with class issue --------- Co-authored-by: Pat Whelan <[email protected]>

…testElser {upgradedNodes=1} elastic#118127

…18702) Improve the planner to detect filters that can be pushed down 'through' a LOOKUP JOIN by determining the conditions scoped to the left/main side and moving them closer to the source. Relates elastic#118305

Add an action to reindex a single index from a source index to a destination index. Unlike the reindex action, this action copies settings and mappings from the source index to the dest index before performing the reindex. This action is part of work to reindex data streams and will be called on each of the backing indices within a data stream. (cherry picked from commit 0a6ce27)

…s testVerifierOnMissingReferencesWithBinaryPlans {default} elastic#118707

… when settings have not changed (elastic#118704) (elastic#118706) If the input index already has the `index.hidden` setting set to `true`, MetadataMigrateToDataStreamService::prepareBackingIndex can incorrectly increment the settings version even if it does not change the settings. This results in an assertion failure in IndexService::updateMetadata that will take down a node if assertions are enabled. This fixes that, only incrementing the settings version if the settings actually changed.

…ohereEmbeddings {upgradedNodes=1} elastic#116974

…o 1b51ff6 (elastic#117903) Co-authored-by: elastic-renovate-prod[bot] <174716857+elastic-renovate-prod[bot]@users.noreply.github.com.>

This commit upgrades to Lucene 9.12.1. Among the bug fixes that Lucene 9.12.1 brings, it also allows easier configuration of the Panama vectorization provider with newer Java versions. Set the org.apache.lucene.vectorization.upperJavaFeatureVersion system property to increase the set of Java versions that Panama vectorization will provide optimized implementations for. We'll need to carefully bump this sys property in the Elasticsearch 8.x series along side the JDK upgrade.

…ts testPruneSome elastic#118728

… org.elasticsearch.xpack.esql.optimizer.LogicalPlanOptimizerTests elastic#118721

) * Introduce TranslationAware interface * Serialize query builder * Fix EsqlNodeSubclassTests * Add javadoc * Address review comments * Revert changes on making constructors private Co-authored-by: Elastic Machine <[email protected]>

elastic#118738) * fix typo in muted CSV test for scoring in ES|QL (elastic#118665) (cherry picked from commit a583a38)

…#118688) Disabled the NoImds test on AWS EC2 instance where it fails because the AWS metadata are available, which is not expected by this test.

…tic#118565) (elastic#118740) (cherry picked from commit 4279281) # Conflicts: # .buildkite/pipelines/periodic-packaging.template.yml # .buildkite/pipelines/periodic-packaging.yml # .buildkite/pipelines/periodic-platform-support.yml # .buildkite/pipelines/pull-request/packaging-tests-unix.yml

The test setup for `ProfileIntegTests` is flawed, where the full name of a user can be a substring of other profile names (e.g., `SER` is a substring of `User <random-string>-space1`) -- when that's passed into suggest call with the `*` space, we get a match on all profiles, instead of only the one profile expected in the test, since we are matching on e.g. `SER*`. This PR restricts the setup to avoid the wildcard profile for that particular test. Closes: elastic#117782

…lastic#118743) * Support ST_ENVELOPE and related ST_XMIN, etc. (elastic#116964) Support ST_ENVELOPE and related ST_XMIN, etc. Based on the PostGIS equivalents: https://postgis.net/docs/ST_Envelope.html https://postgis.net/docs/ST_XMin.html https://postgis.net/docs/ST_XMax.html https://postgis.net/docs/ST_YMin.html https://postgis.net/docs/ST_YMax.html * Fix off-by-one error reported in elastic#118051

Small tweak around how data node requests handle no indices w.r.t. shards. (cherry picked from commit 7585f02)

…c#118566) (elastic#118748)

…#118750) Fix elastic#118721 * Skip corresponding optimizer tests if `LOOKUP JOIN` is disabled. * Enable LogicalPlanOptimizerTests again. (cherry picked from commit bb8503a) # Conflicts: # muted-tests.yml

…lastic#118754) * Disable test on release builds * Unmute

) Fix moving function linear weighted avg Co-authored-by: Quentin Deschamps <[email protected]>

…astic#118655) (elastic#118685) * ESQL: Disable grok.OverwriteName* on pre-8.13 BWC tests (elastic#118655) This prevents two tests in `grok` and `dissect` suites - `overwriteName` and `overwriteNameWhere` and one in the `stats` suite - `byStringAndLongWithAlias` - to run against pre-8.13.0 versions. Reason being that coordinators prior to that version can generate invalid node plans, that'd fail (verification) on 8.18+ nodes. (cherry picked from commit 0441555) * re-enabled disabled tests

* Adding some shell classes * modeling the request objects * Writeable changes to schema * Working parsing tests * Creating a new action * Add outbound request writing (WIP) * Improvements to request serialization * Adding separate transport classes * separate out unified request and combine inputs * Reworking unified inputs * Adding unsupported operation calls * Fixing parsing logic * get the build working * Update docs/changelog/117589.yaml * Fixing injection issue * Allowing model to be overridden but not working yet * Fixing issues * Switch field name for tool * Add suport for toolCalls and refusal in streaming completion * Working tool call response * Separate unified and legacy code paths * Updated the parser, but there are some class cast exceptions to fix * Refactoring tests and request entities * Parse response from OpenAI * Removing unused request classes * precommit * Adding tests for UnifiedCompletionAction Request * Refactoring stop to be a list of strings * Testing for OpenAI response parsing * Refactoring transport action tests to test unified validation code * Fixing various tests * Fixing license header * Reformat streaming results * Finalize response format * remove debug logs * remove changes for debugging * Task type and base inference action tests * Adding openai service tests * Adding model tests * tests for StreamingUnifiedChatCompletionResultsTests toXContentChunked * Fixing change log and removing commented out code * Switch usage to accept null * Adding test for TestStreamingCompletionServiceExtension * Avoid serializing empty lists + request entity tests * Register named writeables from UnifiedCompletionRequest * Removing commented code * Clean up and add more of an explination * remove duplicate test * remove old todos * Refactoring some duplication * Adding javadoc * Addressing feedback --------- Co-authored-by: Jonathan Buttner <[email protected]> Co-authored-by: Jonathan Buttner <[email protected]> (cherry picked from commit 467fdb8) # Conflicts: # x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceCrudIT.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceAction.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/common/DelegatingProcessor.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/action/TransportInferenceActionTests.java

maxhniebergall added the backport label Dec 11, 2024

maxhniebergall mentioned this pull request Dec 11, 2024

[Inference API] Add unified api for chat completions #117589

Merged

elasticsearchmachine added the v8.18.0 label Dec 11, 2024

jonathan-buttner mentioned this pull request Dec 13, 2024

[8.x] [ML] Inference duration and error metrics (#115876) #118700

Merged

jonathan-buttner and others added 2 commits December 13, 2024 16:08

fixing switch with class issue

7a81b24

Merge pull request #3 from jonathan-buttner/ml-fix-unified-switch-cas…

952953a

…e-usage fixing switch with class issue

maxhniebergall changed the base branch from 8.x to main December 16, 2024 13:57

maxhniebergall requested review from a team as code owners December 16, 2024 13:57

maxhniebergall changed the base branch from main to 8.x December 16, 2024 13:58

maxhniebergall force-pushed the backport/8.x/pr-117589 branch from 9e8cbd7 to 952953a Compare December 16, 2024 14:05

joegallo and others added 17 commits December 16, 2024 09:10

Miscellaneous ILM cleanups (elastic#118488) (elastic#118498)

691ecbb

Correcting the index version filter in migration reindex logic (elast…

efba5b7

…ic#118487) (elastic#118503) * Correcting the index version filter in migration reindex logic (elastic#118487) * fixing the version for 8.x

Convert some ILM classes to records (elastic#118466) (elastic#118507)

81044c3

Improve InputStreamIndexInput testSkipBytes (elastic#118485) (elastic…

be3e3ad

…#118522) Relates ES-10234

ESQL: Add CCS tests for FLS and DLS against data streams (elastic#118423

ff7fd22

) (elastic#118526) CCS test coverage for elastic#118378

Building scope -> entitlements map during PolicyManager initialization (

2aabb59

elastic#118070) (elastic#118528)

Add one test for plugin type to PluginsLoaderTests (elastic#117725) (…

cf60500

…elastic#118529) * Add one test for plugin type to PluginsLoaderTests * Suppress ExtraFs (or PluginsUtils etc could fail with extra0 files)

[8.x] Disable check_on_startup for KibanaUserRoleIntegTests (elastic#…

5d74b4d

…118428) (elastic#118532) * Disable check_on_startup for KibanaUserRoleIntegTests (elastic#118428) (cherry picked from commit c30ba12) # Conflicts: # muted-tests.yml * fixup! Unmute test

Adds CCS matrix for 8.17 (elastic#118527) (elastic#118539)

7e60ffc

Co-authored-by: Liam Thompson <[email protected]>

Suppress the for-loop warnings since it is a conscious performance ch…

5096fca

…oice. (elastic#118530) (elastic#118537)

[ML] Fix timeout ingesting an empty string into a semantic_text field (…

1e114a7

…elastic#117840) (elastic#118540)

Changes elser service to elasticsearch service in the Semantic search…

c13e78a

… with the inference API page (elastic#118536) (elastic#118546)

Mute org.elasticsearch.repositories.blobstore.testkit.analyze.MinioRe…

ce1ad81

…positoryAnalysisRestIT org.elasticsearch.repositories.blobstore.testkit.analyze.MinioRepositoryAnalysisRestIT elastic#118548

Add discovery-ec2 integration test for AZ attr (elastic#118452) (el…

a6c75d2

…astic#118541) Verifies that the plugin sets the `aws_availability_zone` automatically by reading the AZ name from the IMDS at startup.

original-brownbear and others added 29 commits December 16, 2024 09:11

Mute org.elasticsearch.xpack.application.OpenAiServiceUpgradeIT testO…

f771e24

…penAiEmbeddings {upgradedNodes=1} elastic#118156

Mute org.elasticsearch.xpack.application.HuggingFaceServiceUpgradeIT …

fbff5d1

…testHFEmbeddings {upgradedNodes=1} elastic#118197

Mute org.elasticsearch.xpack.application.OpenAiServiceUpgradeIT testO…

2c0b6f1

…penAiCompletions {upgradedNodes=2} elastic#118163

Mute org.elasticsearch.xpack.application.OpenAiServiceUpgradeIT testO…

14b68dd

…penAiCompletions {upgradedNodes=1} elastic#118162

Mute org.elasticsearch.xpack.application.HuggingFaceServiceUpgradeIT …

b61aa4b

…testElser {upgradedNodes=1} elastic#118127

Mute org.elasticsearch.xpack.esql.optimizer.PhysicalPlanOptimizerTest…

eea6da5

…s testVerifierOnMissingReferencesWithBinaryPlans {default} elastic#118707

Mute org.elasticsearch.xpack.application.CohereServiceUpgradeIT testC…

271d447

…ohereEmbeddings {upgradedNodes=1} elastic#116974

Update docker.elastic.co/wolfi/chainguard-base:latest Docker digest t…

19d879b

…o 1b51ff6 (elastic#117903) Co-authored-by: elastic-renovate-prod[bot] <174716857+elastic-renovate-prod[bot]@users.noreply.github.com.>

Mute org.elasticsearch.index.engine.RecoverySourcePruneMergePolicyTes…

06dd703

…ts testPruneSome elastic#118728

Mute org.elasticsearch.xpack.esql.optimizer.LogicalPlanOptimizerTests…

a341a79

… org.elasticsearch.xpack.esql.optimizer.LogicalPlanOptimizerTests elastic#118721

[8.x] fix typo in muted CSV test for scoring in ES|QL (elastic#118665) (

20793ed

elastic#118738) * fix typo in muted CSV test for scoring in ES|QL (elastic#118665) (cherry picked from commit a583a38)

[test] Avoid running the NoImds test on AWS (elastic#118675) (elastic…

0e5810f

…#118688) Disabled the NoImds test on AWS EC2 instance where it fails because the AWS metadata are available, which is not expected by this test.

Tweak data node request index handling (elastic#118542) (elastic#118756)

69d0d3c

Small tweak around how data node requests handle no indices w.r.t. shards. (cherry picked from commit 7585f02)

[ci] Add ubuntu-2404 to matrix in packaging and platform jobs (elasti…

63d89f2

…c#118566) (elastic#118748)

ESQL: Fix lookup optimizer tests on release (elastic#118742) (elastic…

2fa7aac

…#118750) Fix elastic#118721 * Skip corresponding optimizer tests if `LOOKUP JOIN` is disabled. * Enable LogicalPlanOptimizerTests again. (cherry picked from commit bb8503a) # Conflicts: # muted-tests.yml

[8.x] ESQL: Disable LOOKUP JOIN physical optimizer test on release (e…

c696266

…lastic#118754) * Disable test on release builds * Unmute

Fix moving function linear weighted avg (elastic#118516) (elastic#118751

7e57d58

) Fix moving function linear weighted avg Co-authored-by: Quentin Deschamps <[email protected]>

maxhniebergall closed this Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[8.x] [Inference API] Add unified api for chat completions (#117589) #118506

[8.x] [Inference API] Add unified api for chat completions (#117589) #118506

maxhniebergall commented Dec 11, 2024

[8.x] [Inference API] Add unified api for chat completions (#117589) #118506

[8.x] [Inference API] Add unified api for chat completions (#117589) #118506

Conversation

maxhniebergall commented Dec 11, 2024

Backport

Questions ?