Skip to content

Commit ec243ca

Browse files
committed
server: #5655 - continue to update other slots on embedding concurrent request.
server: tests: add multi users embeddings as fixed
1 parent 525213d commit ec243ca

File tree

3 files changed

+25
-34
lines changed

3 files changed

+25
-34
lines changed

examples/server/server.cpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -1836,7 +1836,7 @@ struct llama_server_context
18361836
send_embedding(slot);
18371837
slot.release();
18381838
slot.i_batch = -1;
1839-
return true;
1839+
continue;
18401840
}
18411841

18421842
completion_token_output result;
+1-33
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,4 @@
11
# List of ongoing issues
22
@bug
33
Feature: Issues
4-
# Issue #5655
5-
Scenario: Multi users embeddings
6-
Given a server listening on localhost:8080
7-
And a model file stories260K.gguf
8-
And a model alias tinyllama-2
9-
And 42 as server seed
10-
And 64 KV cache size
11-
And 2 slots
12-
And continuous batching
13-
And embeddings extraction
14-
Then the server is starting
15-
Then the server is healthy
16-
17-
Given a prompt:
18-
"""
19-
Write a very long story about AI.
20-
"""
21-
And a prompt:
22-
"""
23-
Write another very long music lyrics.
24-
"""
25-
And a prompt:
26-
"""
27-
Write a very long poem.
28-
"""
29-
And a prompt:
30-
"""
31-
Write a very long joke.
32-
"""
33-
Given concurrent embedding requests
34-
Then the server is busy
35-
Then the server is idle
36-
Then all embeddings are generated
4+
# No confirmed issue at the moment

examples/server/tests/features/parallel.feature

+23
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ Feature: Parallel
88
And 42 as server seed
99
And 64 KV cache size
1010
And 2 slots
11+
And embeddings extraction
1112
And continuous batching
1213
Then the server is starting
1314
Then the server is healthy
@@ -75,3 +76,25 @@ Feature: Parallel
7576
Then the server is busy
7677
Then the server is idle
7778
Then all prompts are predicted
79+
80+
Scenario: Multi users embeddings
81+
Given a prompt:
82+
"""
83+
Write a very long story about AI.
84+
"""
85+
And a prompt:
86+
"""
87+
Write another very long music lyrics.
88+
"""
89+
And a prompt:
90+
"""
91+
Write a very long poem.
92+
"""
93+
And a prompt:
94+
"""
95+
Write a very long joke.
96+
"""
97+
Given concurrent embedding requests
98+
Then the server is busy
99+
Then the server is idle
100+
Then all embeddings are generated

0 commit comments

Comments
 (0)