-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[8.x] [Inference API] Add unified api for chat completions (#117589) #118506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
maxhniebergall
wants to merge
91
commits into
elastic:8.x
from
maxhniebergall:backport/8.x/pr-117589
Closed
[8.x] [Inference API] Add unified api for chat completions (#117589) #118506
maxhniebergall
wants to merge
91
commits into
elastic:8.x
from
maxhniebergall:backport/8.x/pr-117589
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Adding some shell classes * modeling the request objects * Writeable changes to schema * Working parsing tests * Creating a new action * Add outbound request writing (WIP) * Improvements to request serialization * Adding separate transport classes * separate out unified request and combine inputs * Reworking unified inputs * Adding unsupported operation calls * Fixing parsing logic * get the build working * Update docs/changelog/117589.yaml * Fixing injection issue * Allowing model to be overridden but not working yet * Fixing issues * Switch field name for tool * Add suport for toolCalls and refusal in streaming completion * Working tool call response * Separate unified and legacy code paths * Updated the parser, but there are some class cast exceptions to fix * Refactoring tests and request entities * Parse response from OpenAI * Removing unused request classes * precommit * Adding tests for UnifiedCompletionAction Request * Refactoring stop to be a list of strings * Testing for OpenAI response parsing * Refactoring transport action tests to test unified validation code * Fixing various tests * Fixing license header * Reformat streaming results * Finalize response format * remove debug logs * remove changes for debugging * Task type and base inference action tests * Adding openai service tests * Adding model tests * tests for StreamingUnifiedChatCompletionResultsTests toXContentChunked * Fixing change log and removing commented out code * Switch usage to accept null * Adding test for TestStreamingCompletionServiceExtension * Avoid serializing empty lists + request entity tests * Register named writeables from UnifiedCompletionRequest * Removing commented code * Clean up and add more of an explination * remove duplicate test * remove old todos * Refactoring some duplication * Adding javadoc * Addressing feedback --------- Co-authored-by: Jonathan Buttner <[email protected]> Co-authored-by: Jonathan Buttner <[email protected]> (cherry picked from commit 467fdb8) # Conflicts: # x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceCrudIT.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceAction.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/common/DelegatingProcessor.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/action/TransportInferenceActionTests.java
…e-usage fixing switch with class issue
9e8cbd7
to
952953a
Compare
…ic#118487) (elastic#118503) * Correcting the index version filter in migration reindex logic (elastic#118487) * fixing the version for 8.x
) (elastic#118526) CCS test coverage for elastic#118378
…elastic#118529) * Add one test for plugin type to PluginsLoaderTests * Suppress ExtraFs (or PluginsUtils etc could fail with extra0 files)
…18455) Fixes: elastic#118311 elastic#118310 elastic#118309 Same issue that was fixed in: elastic#110963 `@BeforeClass` is executed after the test rules. This means it creates the clusters for all the invalid versions, which sometimes doesnt work. Change it to a rule which definitely evaluates before the clusters are created. This will also speed up this test in CI.
…118428) (elastic#118532) * Disable check_on_startup for KibanaUserRoleIntegTests (elastic#118428) (cherry picked from commit c30ba12) # Conflicts: # muted-tests.yml * fixup! Unmute test
Co-authored-by: Liam Thompson <[email protected]>
…lastic#118533) * Handle all exceptions in data nodes can match (elastic#117469) During the can match phase, prior to the query phase, we may have exceptions that are returned back to the coordinating node, handled gracefully as if the shard returned canMatch=true. During the query phase, we perform an additional rewrite and can match phase to eventually shortcut the query phase for the shard. That needs to handle exceptions as well. Currently, an exception there causes shard failures, while we should rather go ahead and execute the query on the shard. Instead of adding another try catch on consumers code, this commit adds exception handling to the method itself so that it can no longer throw exceptions and similar mistakes can no longer be made in the future. At the same time, this commit makes the can match method more easily testable without requiring a full-blown SearchService instance. Closes elastic#104994 * fix compile
… with the inference API page (elastic#118536) (elastic#118546)
…positoryAnalysisRestIT org.elasticsearch.repositories.blobstore.testkit.analyze.MinioRepositoryAnalysisRestIT elastic#118548
…astic#118541) Verifies that the plugin sets the `aws_availability_zone` automatically by reading the AZ name from the IMDS at startup.
…ic#118331) (elastic#118550) SmbTestContainer base image upgraded from Ubuntu 16.04 to 24.04 to avoid hanging Python module compilation when installing samba package. Installing SMB had to be moved from container building to starting because SYS_ADMIN capability is required. (cherry picked from commit a0f64d2) # Conflicts: # .buildkite/pipelines/pull-request/packaging-tests-unix.yml
elastic#117245) (elastic#118698) We don't need to use this request, the handler for freeing of scroll requests literally goes to the same transport handler and doesn't come with the list of indices. The original security need for keeping the list of indices around is long gone.
…penAiEmbeddings {upgradedNodes=1} elastic#118156
…testHFEmbeddings {upgradedNodes=1} elastic#118197
…penAiCompletions {upgradedNodes=2} elastic#118163
…penAiCompletions {upgradedNodes=1} elastic#118162
…stic#118700) * [ML] Inference duration and error metrics (elastic#115876) Add `es.inference.requests.time` metric around `infer` API. As recommended by OTel spec, errors are determined by the presence or absence of the `error.type` attribute in the metric. "error.type" will be the http status code (as a string) if it is available, otherwise it will be the name of the exception (e.g. NullPointerException). Additional notes: - ApmInferenceStats is merged into InferenceStats. Originally we planned to have multiple implementations, but now we're only using APM. - Request count is now always recorded, even when there are failures loading the endpoint configuration. - Added a hook in streaming for cancel messages, so we can close the metrics when a user cancels the stream. (cherry picked from commit 26870ef) * fixing switch with class issue --------- Co-authored-by: Pat Whelan <[email protected]>
…testElser {upgradedNodes=1} elastic#118127
…18702) Improve the planner to detect filters that can be pushed down 'through' a LOOKUP JOIN by determining the conditions scoped to the left/main side and moving them closer to the source. Relates elastic#118305
Add an action to reindex a single index from a source index to a destination index. Unlike the reindex action, this action copies settings and mappings from the source index to the dest index before performing the reindex. This action is part of work to reindex data streams and will be called on each of the backing indices within a data stream. (cherry picked from commit 0a6ce27)
…s testVerifierOnMissingReferencesWithBinaryPlans {default} elastic#118707
… when settings have not changed (elastic#118704) (elastic#118706) If the input index already has the `index.hidden` setting set to `true`, MetadataMigrateToDataStreamService::prepareBackingIndex can incorrectly increment the settings version even if it does not change the settings. This results in an assertion failure in IndexService::updateMetadata that will take down a node if assertions are enabled. This fixes that, only incrementing the settings version if the settings actually changed.
…ohereEmbeddings {upgradedNodes=1} elastic#116974
…o 1b51ff6 (elastic#117903) Co-authored-by: elastic-renovate-prod[bot] <174716857+elastic-renovate-prod[bot]@users.noreply.github.com.>
This commit upgrades to Lucene 9.12.1. Among the bug fixes that Lucene 9.12.1 brings, it also allows easier configuration of the Panama vectorization provider with newer Java versions. Set the org.apache.lucene.vectorization.upperJavaFeatureVersion system property to increase the set of Java versions that Panama vectorization will provide optimized implementations for. We'll need to carefully bump this sys property in the Elasticsearch 8.x series along side the JDK upgrade.
… org.elasticsearch.xpack.esql.optimizer.LogicalPlanOptimizerTests elastic#118721
) * Introduce TranslationAware interface * Serialize query builder * Fix EsqlNodeSubclassTests * Add javadoc * Address review comments * Revert changes on making constructors private Co-authored-by: Elastic Machine <[email protected]>
elastic#118738) * fix typo in muted CSV test for scoring in ES|QL (elastic#118665) (cherry picked from commit a583a38)
…#118688) Disabled the NoImds test on AWS EC2 instance where it fails because the AWS metadata are available, which is not expected by this test.
…tic#118565) (elastic#118740) (cherry picked from commit 4279281) # Conflicts: # .buildkite/pipelines/periodic-packaging.template.yml # .buildkite/pipelines/periodic-packaging.yml # .buildkite/pipelines/periodic-platform-support.yml # .buildkite/pipelines/pull-request/packaging-tests-unix.yml
The test setup for `ProfileIntegTests` is flawed, where the full name of a user can be a substring of other profile names (e.g., `SER` is a substring of `User <random-string>-space1`) -- when that's passed into suggest call with the `*` space, we get a match on all profiles, instead of only the one profile expected in the test, since we are matching on e.g. `SER*`. This PR restricts the setup to avoid the wildcard profile for that particular test. Closes: elastic#117782
…lastic#118743) * Support ST_ENVELOPE and related ST_XMIN, etc. (elastic#116964) Support ST_ENVELOPE and related ST_XMIN, etc. Based on the PostGIS equivalents: https://postgis.net/docs/ST_Envelope.html https://postgis.net/docs/ST_XMin.html https://postgis.net/docs/ST_XMax.html https://postgis.net/docs/ST_YMin.html https://postgis.net/docs/ST_YMax.html * Fix off-by-one error reported in elastic#118051
Small tweak around how data node requests handle no indices w.r.t. shards. (cherry picked from commit 7585f02)
…#118750) Fix elastic#118721 * Skip corresponding optimizer tests if `LOOKUP JOIN` is disabled. * Enable LogicalPlanOptimizerTests again. (cherry picked from commit bb8503a) # Conflicts: # muted-tests.yml
…lastic#118754) * Disable test on release builds * Unmute
) Fix moving function linear weighted avg Co-authored-by: Quentin Deschamps <[email protected]>
…astic#118655) (elastic#118685) * ESQL: Disable grok.OverwriteName* on pre-8.13 BWC tests (elastic#118655) This prevents two tests in `grok` and `dissect` suites - `overwriteName` and `overwriteNameWhere` and one in the `stats` suite - `byStringAndLongWithAlias` - to run against pre-8.13.0 versions. Reason being that coordinators prior to that version can generate invalid node plans, that'd fail (verification) on 8.18+ nodes. (cherry picked from commit 0441555) * re-enabled disabled tests
* Adding some shell classes * modeling the request objects * Writeable changes to schema * Working parsing tests * Creating a new action * Add outbound request writing (WIP) * Improvements to request serialization * Adding separate transport classes * separate out unified request and combine inputs * Reworking unified inputs * Adding unsupported operation calls * Fixing parsing logic * get the build working * Update docs/changelog/117589.yaml * Fixing injection issue * Allowing model to be overridden but not working yet * Fixing issues * Switch field name for tool * Add suport for toolCalls and refusal in streaming completion * Working tool call response * Separate unified and legacy code paths * Updated the parser, but there are some class cast exceptions to fix * Refactoring tests and request entities * Parse response from OpenAI * Removing unused request classes * precommit * Adding tests for UnifiedCompletionAction Request * Refactoring stop to be a list of strings * Testing for OpenAI response parsing * Refactoring transport action tests to test unified validation code * Fixing various tests * Fixing license header * Reformat streaming results * Finalize response format * remove debug logs * remove changes for debugging * Task type and base inference action tests * Adding openai service tests * Adding model tests * tests for StreamingUnifiedChatCompletionResultsTests toXContentChunked * Fixing change log and removing commented out code * Switch usage to accept null * Adding test for TestStreamingCompletionServiceExtension * Avoid serializing empty lists + request entity tests * Register named writeables from UnifiedCompletionRequest * Removing commented code * Clean up and add more of an explination * remove duplicate test * remove old todos * Refactoring some duplication * Adding javadoc * Addressing feedback --------- Co-authored-by: Jonathan Buttner <[email protected]> Co-authored-by: Jonathan Buttner <[email protected]> (cherry picked from commit 467fdb8) # Conflicts: # x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceCrudIT.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceAction.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/common/DelegatingProcessor.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/action/TransportInferenceActionTests.java
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport
This will backport the following commits from
main
to8.x
:Questions ?
Please refer to the Backport tool documentation