Skip to content
This repository was archived by the owner on Sep 27, 2019. It is now read-only.

Multi-column IndexScan plan selection fix #1305

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

vkonagar
Copy link

This PR fixes the issue #1299. This changes the way we find index match for predicate columns in IndexScan rule implementation. Specifically, this change makes sure that the optimizer picks a multi-column index only if the predicate columns match the index columns in order.

@coveralls
Copy link

coveralls commented Apr 17, 2018

Coverage Status

Coverage decreased (-0.002%) to 77.267% when pulling 3a20b40 on vkonagar:master into 54b02a4 on cmu-db:master.

@apavlo
Copy link
Member

apavlo commented Apr 18, 2018

@vkonagar Can you provide test cases for this to know what you fixed? Thanks!

@vkonagar
Copy link
Author

@apavlo Andy, I have added a test to verify the query plan correctness with respect to multi-column indexes.

This doesn't fix the cost model for multi-column indexes, which is not currently supported in the optimizer. I have talked to bowei and we will look into that.

@chenboy
Copy link
Contributor

chenboy commented Apr 19, 2018

As discussed in today's meeting, we want to fix the cost model to consider multi-column indices. Let me see if I can fix it. @GustavoAngulo @nappelson I'm wondering if we have a testing infrastructure for cost model correctness right now?

Copy link
Member

@linmagit linmagit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the comment.

@linmagit linmagit dismissed their stale review April 21, 2018 00:23

Forgot to add the comment... will submit again

Copy link
Member

@linmagit linmagit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the comment.

index_expr_type_list.push_back(expr_type_list[offset]);
index_value_list.push_back(value_list[offset]);
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should check for an exact same ordering here. For example, if you have an index on column (a, b), and your predicates are "b = 5 and a = 1", then we should be able to use the index scan. However, the check here won't identify that because it requires the order in the predicates to be exactly the same as in the index.

After thinking about this, I actually think that you should just keep the old index_key_column_id_list. You just need to add a flag about whether the lead (highest) column in the index has been referenced in the index. As long as that is true, we should be able to use the index for the scan. Thoughts? @chenboy @vkonagar

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I also think we don't need to consider order here. The way to fix this issue is letting the cost model compute the correct cost for these indices.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @chenboy! @vkonagar and I are discussing some implementation details for this on Slack. We've added you into the channel. There's some cost model related issue we think you probably have a better idea on what's going on. Can you take a look at Slack? Thanks!

@apavlo apavlo added this to the tpcc milestone Jun 21, 2018
@apavlo
Copy link
Member

apavlo commented Jun 21, 2018

This is another important fix that we are going to need for TPC-C.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants