Skip to content

ESQL: Lookup Join meta issue #116208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
33 of 48 tasks
costin opened this issue Nov 4, 2024 · 7 comments · Fixed by #116515
Open
33 of 48 tasks

ESQL: Lookup Join meta issue #116208

costin opened this issue Nov 4, 2024 · 7 comments · Fixed by #116515
Assignees
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@costin
Copy link
Member

costin commented Nov 4, 2024

Description

Meta ticket around the main implementation topics required for lookup join

Compute

  • Make field extractor work with LookupOperators (keep field extraction in one place)
  • Bifurcate join operator/allow other operators between join and Lookup source (eval, filter, etc..)
  • Align LookupJoin and LookupJoinExec - the first allows branching, the second does not.

Update: decided to postpone all this until we either evolve the way we perform the actual lookup join, or allow more features like actual condition expressions.

Later

Maybe

@costin costin self-assigned this Nov 4, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Nov 4, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@costin costin linked a pull request Nov 8, 2024 that will close this issue
costin added a commit that referenced this issue Nov 20, 2024
First PR for adding LOOKUP JOIN in ESQL.
Introduces grammar and wires the main building blocks to execute a query; follow-ups are required (see #116208 for more details).

Co-authored-by: Nik Everett <[email protected]>
@costin costin reopened this Nov 20, 2024
costin added a commit to costin/elasticsearch that referenced this issue Nov 20, 2024
First PR for adding LOOKUP JOIN in ESQL.
Introduces grammar and wires the main building blocks to execute a query
 ; follow-ups are required (see elastic#116208 for more details).

Co-authored-by: Nik Everett <[email protected]>
(cherry picked from commit bc785f5)
elasticsearchmachine pushed a commit that referenced this issue Nov 20, 2024
First PR for adding LOOKUP JOIN in ESQL.
Introduces grammar and wires the main building blocks to execute a query
 ; follow-ups are required (see #116208 for more details).

Co-authored-by: Nik Everett <[email protected]>
(cherry picked from commit bc785f5)
rjernst pushed a commit to rjernst/elasticsearch that referenced this issue Nov 20, 2024
First PR for adding LOOKUP JOIN in ESQL.
Introduces grammar and wires the main building blocks to execute a query; follow-ups are required (see elastic#116208 for more details).

Co-authored-by: Nik Everett <[email protected]>
alexey-ivanov-es pushed a commit to alexey-ivanov-es/elasticsearch that referenced this issue Nov 28, 2024
First PR for adding LOOKUP JOIN in ESQL.
Introduces grammar and wires the main building blocks to execute a query; follow-ups are required (see elastic#116208 for more details).

Co-authored-by: Nik Everett <[email protected]>
@nicpenning
Copy link

I have been monitoring this feature and looking forward to it's capabilities.

I noticed this PR (elastic/kibana#207350) came down the pipeline and it made me wonder.

For LOOKUP to be usable on multiple indices, does it require the indices that will be joined to have Lookup Index mode or is that just a more efficient way to perform the operation?

I ask because I am trying to understand how the LOOKUP varies from some of the way that ENRICH works. It is painful today having to create Enrich policy after Enrich policy on the same dataset because you can only have 1 match field.

One of many use cases is enriching data from 1 source with another. Let's say I have firewall logs with source.ip : 1.2.3.4, event.action : block, url.domain : bad.xyz and I want to join that with DHCP data around that same time with the host.ip : 1.2.3.4., host.name : computer1, I would want to combine this into a single row showing something like:
host.ip : 1.2.3.4, host.name : computer 1, source.ip : 1.2.3.4, url.domain: bad.xyz, event.action : block

This would allow for joining data in way to simplify outcomes of multiple data sources.

Another use case is simply enriching without the need for enrich policies as stated above and just "enrich" using other indices ad-hoc. However, from what I have observed so far, there may be additional steps to perform this LOOKUP.

Any additional details on what the goal here is with some examples could help understand what is to come.

Thanks!

@nik9000
Copy link
Member

nik9000 commented Jan 28, 2025

For LOOKUP to be usable on multiple indices, does it require the indices that will be joined to have Lookup Index mode or is that just a more efficient way to perform the operation?

The LOOKUP JOIN will require that the index we join into the main stream has the lookup mode. Fundamentally it's about having a single sharded index which made the problem simpler. It was easier to get there from our ENRICH behavior than arbitrary joining.

@nicpenning
Copy link

Thanks for the clarification!

@getkub
Copy link

getkub commented Mar 9, 2025

This feature is so important, for many of Enterprise customers like ours which are now exporting data outside Elastic to do the same. Can we have an example docs page to show the various abilities of these commands please?

When we say 'lookup' join searchtime , it is broad statement (for instance we are really looking for joining with ANOTHER index). So the ability to do 'sub-search' is quite important too as part of lookup. Hence if we can update into a list or preview of Elastic v9.x somewhere please I can produce some list of lookup examples and update you for testing ? Thanks again

@tylerperk
Copy link

Hi @getkub Lookup Join was released as Tech Preview in 8.18. Please check the announcement blog and the docs for more information.

We are aware of the need for joining against any index and sub searches - both are on the future roadmap.

@getkub
Copy link

getkub commented Apr 24, 2025

We are aware of the need for joining against any index and sub searches - both are on the future roadmap.

Great progress.
In the blogpost example, if you could solve it using "<subsearch>" instead of an "<index>", that would be ideal

FROM kibana_sample_data_logs | WHERE response.keyword != "200" 
| LOOKUP JOIN   [ FROM env_index |  stats count(*) by client.ip, environment | keep client.ip, environment] ON client.ip
| STATS COUNT(*) by response, environment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants