File tree 3 files changed +25
-34
lines changed
3 files changed +25
-34
lines changed Original file line number Diff line number Diff line change @@ -1836,7 +1836,7 @@ struct llama_server_context
1836
1836
send_embedding (slot);
1837
1837
slot.release ();
1838
1838
slot.i_batch = -1 ;
1839
- return true ;
1839
+ continue ;
1840
1840
}
1841
1841
1842
1842
completion_token_output result;
Original file line number Diff line number Diff line change 1
1
# List of ongoing issues
2
2
@bug
3
3
Feature : Issues
4
- # Issue #5655
5
- Scenario : Multi users embeddings
6
- Given a server listening on localhost:8080
7
- And a model file stories260K.gguf
8
- And a model alias tinyllama-2
9
- And 42 as server seed
10
- And 64 KV cache size
11
- And 2 slots
12
- And continuous batching
13
- And embeddings extraction
14
- Then the server is starting
15
- Then the server is healthy
16
-
17
- Given a prompt:
18
- """
19
- Write a very long story about AI.
20
- """
21
- And a prompt:
22
- """
23
- Write another very long music lyrics.
24
- """
25
- And a prompt:
26
- """
27
- Write a very long poem.
28
- """
29
- And a prompt:
30
- """
31
- Write a very long joke.
32
- """
33
- Given concurrent embedding requests
34
- Then the server is busy
35
- Then the server is idle
36
- Then all embeddings are generated
4
+ # No confirmed issue at the moment
Original file line number Diff line number Diff line change @@ -8,6 +8,7 @@ Feature: Parallel
8
8
And 42 as server seed
9
9
And 64 KV cache size
10
10
And 2 slots
11
+ And embeddings extraction
11
12
And continuous batching
12
13
Then the server is starting
13
14
Then the server is healthy
@@ -75,3 +76,25 @@ Feature: Parallel
75
76
Then the server is busy
76
77
Then the server is idle
77
78
Then all prompts are predicted
79
+
80
+ Scenario : Multi users embeddings
81
+ Given a prompt:
82
+ """
83
+ Write a very long story about AI.
84
+ """
85
+ And a prompt:
86
+ """
87
+ Write another very long music lyrics.
88
+ """
89
+ And a prompt:
90
+ """
91
+ Write a very long poem.
92
+ """
93
+ And a prompt:
94
+ """
95
+ Write a very long joke.
96
+ """
97
+ Given concurrent embedding requests
98
+ Then the server is busy
99
+ Then the server is idle
100
+ Then all embeddings are generated
You can’t perform that action at this time.
0 commit comments