Skip to content

[8.x] [Inference API] Add unified api for chat completions (#117589) #118506

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

maxhniebergall
Copy link
Contributor

Backport

This will backport the following commits from main to 8.x:

Questions ?

Please refer to the Backport tool documentation

* Adding some shell classes

* modeling the request objects

* Writeable changes to schema

* Working parsing tests

* Creating a new action

* Add outbound request writing (WIP)

* Improvements to request serialization

* Adding separate transport classes

* separate out unified request and combine inputs

* Reworking unified inputs

* Adding unsupported operation calls

* Fixing parsing logic

* get the build working

* Update docs/changelog/117589.yaml

* Fixing injection issue

* Allowing model to be overridden but not working yet

* Fixing issues

* Switch field name for tool

* Add suport for toolCalls and refusal in streaming completion

* Working tool call response

* Separate unified and legacy code paths

* Updated the parser, but there are some class cast exceptions to fix

* Refactoring tests and request entities

* Parse response from OpenAI

* Removing unused request classes

* precommit

* Adding tests for UnifiedCompletionAction Request

* Refactoring stop to be a list of strings

* Testing for OpenAI response parsing

* Refactoring transport action tests to test unified validation code

* Fixing various tests

* Fixing license header

* Reformat streaming results

* Finalize response format

* remove debug logs

* remove changes for debugging

* Task type and base inference action tests

* Adding openai service tests

* Adding model tests

* tests for StreamingUnifiedChatCompletionResultsTests toXContentChunked

* Fixing change log and removing commented out code

* Switch usage to accept null

* Adding test for TestStreamingCompletionServiceExtension

* Avoid serializing empty lists + request entity tests

* Register named writeables from UnifiedCompletionRequest

* Removing commented code

* Clean up and add more of an explination

* remove duplicate test

* remove old todos

* Refactoring some duplication

* Adding javadoc

* Addressing feedback

---------

Co-authored-by: Jonathan Buttner <[email protected]>
Co-authored-by: Jonathan Buttner <[email protected]>
(cherry picked from commit 467fdb8)

# Conflicts:
#	x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceCrudIT.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceAction.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/common/DelegatingProcessor.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/action/TransportInferenceActionTests.java
@maxhniebergall maxhniebergall changed the base branch from 8.x to main December 16, 2024 13:57
@maxhniebergall maxhniebergall requested review from a team as code owners December 16, 2024 13:57
@maxhniebergall maxhniebergall changed the base branch from main to 8.x December 16, 2024 13:58
joegallo and others added 17 commits December 16, 2024 09:10
…ic#118487) (elastic#118503)

* Correcting the index version filter in migration reindex logic (elastic#118487)

* fixing the version for 8.x
…elastic#118529)

* Add one test for plugin type to PluginsLoaderTests

* Suppress ExtraFs (or PluginsUtils etc could fail with extra0 files)
…18455)

Fixes: elastic#118311
elastic#118310
elastic#118309

Same issue that was fixed in:
elastic#110963

`@BeforeClass` is executed after the test rules. This means it creates
the clusters for all the invalid versions, which sometimes doesnt work.

Change it to a rule which definitely evaluates before the clusters are
created. This will also speed up this test in CI.
…118428) (elastic#118532)

* Disable check_on_startup for KibanaUserRoleIntegTests (elastic#118428)

(cherry picked from commit c30ba12)

# Conflicts:
#	muted-tests.yml

* fixup! Unmute test
…lastic#118533)

* Handle all exceptions in data nodes can match (elastic#117469)

During the can match phase, prior to the query phase, we may have exceptions
that are returned back to the coordinating node, handled gracefully as if the
shard returned canMatch=true.

During the query phase, we perform an additional rewrite and can match phase
to eventually shortcut the query phase for the shard. That needs to handle
exceptions as well. Currently, an exception there causes shard failures, while
we should rather go ahead and execute the query on the shard.

Instead of adding another try catch on consumers code, this commit adds exception handling to the method itself so that it can no longer throw exceptions and similar mistakes can no longer be made in the future.

At the same time, this commit makes the can match method more easily testable without requiring a full-blown SearchService instance.

Closes elastic#104994

* fix compile
…positoryAnalysisRestIT org.elasticsearch.repositories.blobstore.testkit.analyze.MinioRepositoryAnalysisRestIT elastic#118548
…astic#118541)

Verifies that the plugin sets the `aws_availability_zone` automatically
by reading the AZ name from the IMDS at startup.
…ic#118331) (elastic#118550)

SmbTestContainer base image upgraded from Ubuntu 16.04 to 24.04 to avoid
 hanging Python module compilation when installing samba package.
 Installing SMB had to be moved from container building to starting because
 SYS_ADMIN capability is required.

(cherry picked from commit a0f64d2)

# Conflicts:
#	.buildkite/pipelines/pull-request/packaging-tests-unix.yml
original-brownbear and others added 29 commits December 16, 2024 09:11
elastic#117245) (elastic#118698)

We don't need to use this request, the handler for freeing of scroll requests literally goes
to the same transport handler and doesn't come with the list of indices.
The original security need for keeping the list of indices around is long gone.
…stic#118700)

* [ML] Inference duration and error metrics (elastic#115876)

Add `es.inference.requests.time` metric around `infer` API.

As recommended by OTel spec, errors are determined by the
presence or absence of the `error.type` attribute in the metric.
"error.type" will be the http status code (as a string) if it is
available, otherwise it will be the name of the exception (e.g.
NullPointerException).

Additional notes:
- ApmInferenceStats is merged into InferenceStats. Originally we planned
  to have multiple implementations, but now we're only using APM.
- Request count is now always recorded, even when there are failures
  loading the endpoint configuration.
- Added a hook in streaming for cancel messages, so we can close the
  metrics when a user cancels the stream.

(cherry picked from commit 26870ef)

* fixing switch with class issue

---------

Co-authored-by: Pat Whelan <[email protected]>
…18702)

Improve the planner to detect filters that can be pushed down 'through'
 a LOOKUP JOIN by determining the conditions scoped to the left/main
 side and moving them closer to the source.

Relates elastic#118305
Add an action to reindex a single index from a source index to a destination index. Unlike the reindex action, this action copies settings and mappings from the source index to the dest index before performing the reindex. This action is part of work to reindex data streams and will be called on each of the backing indices within a data stream.

(cherry picked from commit 0a6ce27)
…s testVerifierOnMissingReferencesWithBinaryPlans {default} elastic#118707
… when settings have not changed (elastic#118704) (elastic#118706)

If the input index already has the `index.hidden` setting set to `true`,
MetadataMigrateToDataStreamService::prepareBackingIndex can incorrectly
increment the settings version even if it does not change the settings.
This results in an assertion failure in IndexService::updateMetadata
that will take down a node if assertions are enabled. This fixes that,
only incrementing the settings version if the settings actually changed.
…o 1b51ff6 (elastic#117903)

Co-authored-by: elastic-renovate-prod[bot] <174716857+elastic-renovate-prod[bot]@users.noreply.github.com.>
This commit upgrades to Lucene 9.12.1.

Among the bug fixes that Lucene 9.12.1 brings, it also allows easier configuration of the Panama vectorization provider with newer Java versions. Set the org.apache.lucene.vectorization.upperJavaFeatureVersion system property to increase the set of Java versions that Panama vectorization will provide optimized implementations for. We'll need to carefully bump this sys property in the Elasticsearch 8.x series along side the JDK upgrade.
… org.elasticsearch.xpack.esql.optimizer.LogicalPlanOptimizerTests elastic#118721
)

* Introduce TranslationAware interface

* Serialize query builder

* Fix EsqlNodeSubclassTests

* Add javadoc

* Address review comments

* Revert changes on making constructors private

Co-authored-by: Elastic Machine <[email protected]>
elastic#118738)

* fix typo in muted CSV test for scoring in ES|QL (elastic#118665)

(cherry picked from commit a583a38)
…#118688)

Disabled the NoImds test on AWS EC2 instance where it fails because the
AWS metadata are available, which is not expected by this test.
…tic#118565) (elastic#118740)

(cherry picked from commit 4279281)

# Conflicts:
#	.buildkite/pipelines/periodic-packaging.template.yml
#	.buildkite/pipelines/periodic-packaging.yml
#	.buildkite/pipelines/periodic-platform-support.yml
#	.buildkite/pipelines/pull-request/packaging-tests-unix.yml
The test setup for `ProfileIntegTests` is flawed, where the full name of
a user can be a substring of other profile names (e.g., `SER` is a
substring of `User <random-string>-space1`) -- when that's passed into
suggest call with the `*` space, we get a match on all profiles, instead
of only the one profile expected in the test, since we are matching on
e.g. `SER*`. This PR restricts the setup to avoid the wildcard profile
for that particular test.

Closes: elastic#117782
Small tweak around how data node requests handle no indices w.r.t.
shards.

(cherry picked from commit 7585f02)
…#118750)

Fix elastic#118721

* Skip corresponding optimizer tests if `LOOKUP JOIN` is disabled.
* Enable LogicalPlanOptimizerTests again.

(cherry picked from commit bb8503a)

# Conflicts:
#	muted-tests.yml
)

Fix moving function linear weighted avg

Co-authored-by: Quentin Deschamps <[email protected]>
…astic#118655) (elastic#118685)

* ESQL: Disable grok.OverwriteName* on pre-8.13 BWC tests (elastic#118655)

This prevents two tests in `grok` and `dissect` suites - `overwriteName` and `overwriteNameWhere` and one in the `stats` suite - `byStringAndLongWithAlias` - to run against pre-8.13.0 versions. Reason being that coordinators prior to that version can generate invalid node plans, that'd fail (verification) on 8.18+ nodes.

(cherry picked from commit 0441555)

* re-enabled disabled tests
* Adding some shell classes

* modeling the request objects

* Writeable changes to schema

* Working parsing tests

* Creating a new action

* Add outbound request writing (WIP)

* Improvements to request serialization

* Adding separate transport classes

* separate out unified request and combine inputs

* Reworking unified inputs

* Adding unsupported operation calls

* Fixing parsing logic

* get the build working

* Update docs/changelog/117589.yaml

* Fixing injection issue

* Allowing model to be overridden but not working yet

* Fixing issues

* Switch field name for tool

* Add suport for toolCalls and refusal in streaming completion

* Working tool call response

* Separate unified and legacy code paths

* Updated the parser, but there are some class cast exceptions to fix

* Refactoring tests and request entities

* Parse response from OpenAI

* Removing unused request classes

* precommit

* Adding tests for UnifiedCompletionAction Request

* Refactoring stop to be a list of strings

* Testing for OpenAI response parsing

* Refactoring transport action tests to test unified validation code

* Fixing various tests

* Fixing license header

* Reformat streaming results

* Finalize response format

* remove debug logs

* remove changes for debugging

* Task type and base inference action tests

* Adding openai service tests

* Adding model tests

* tests for StreamingUnifiedChatCompletionResultsTests toXContentChunked

* Fixing change log and removing commented out code

* Switch usage to accept null

* Adding test for TestStreamingCompletionServiceExtension

* Avoid serializing empty lists + request entity tests

* Register named writeables from UnifiedCompletionRequest

* Removing commented code

* Clean up and add more of an explination

* remove duplicate test

* remove old todos

* Refactoring some duplication

* Adding javadoc

* Addressing feedback

---------

Co-authored-by: Jonathan Buttner <[email protected]>
Co-authored-by: Jonathan Buttner <[email protected]>
(cherry picked from commit 467fdb8)

# Conflicts:
#	x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceCrudIT.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceAction.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/common/DelegatingProcessor.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/action/TransportInferenceActionTests.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.