Skip to content

GH-6114: Static path matching #6146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

dantleech
Copy link
Contributor

@dantleech dantleech commented Mar 10, 2025

This (early stage) WIP PR introduces a static path matcher which intends to emulate the behavior of the PHPUnit FileIterator in order to prevent PHPUnit traversing the filesystem when a deprecation is triggered.

The PHPUnit FileIterator uses glob to find directories and we therefore need to support the glob patterns - which can vary according to the platform. This PR uses https://man7.org/linux/man-pages/man7/glob.7.html as a reference in addition to testing the behavior locally to confirm assumptions.

The webmozart/glob provides a similar feature however it's behavior is different as it supports curly braces, and * is restricted to a single directory level, while * in PHPUnit will return all descendants and I'm sure there are other differences - however I've used that as a starting point.

TODO:

  • Escaping unhandled regex characters, or escaping the regex before parsing the pattern.
  • Character classes [:alnum:] etc.
  • Collating symbols
  • Equivalence class expressions
  • Emulating glob behavior of unterminated [ character groups.
  • Seems to be a PHPUnit globstar bug whereby /a** will match /b and all other directories, where as /ab* will not match anything. We can either copy that behavior or "fix" it.

and maybe writing the implementation from scratch if regex turns out to be a bad fit.

Usages on Github:

@dantleech dantleech marked this pull request as draft March 10, 2025 12:07
@sebastianbergmann sebastianbergmann added feature/test-runner CLI test runner type/performance Issues related to resource consumption (time and memory) labels Mar 10, 2025

class FileMatcherPattern
{
public function __construct(public string $path)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will/could include the $suffix, $prefix and $exclude also

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or could simply remove this and handle the prefix/suffix independently of the "file matcher"

@dantleech dantleech force-pushed the gh-6114-source-map-no-fs branch from b2da2f7 to 797d6b7 Compare March 12, 2025 11:47
@dantleech
Copy link
Contributor Author

Have refactored to tokenize the glob string, while less performant it's easier to reason about and we only need to compile the regex for each <include or <exclude - not for each path.

@dantleech dantleech force-pushed the gh-6114-source-map-no-fs branch 3 times, most recently from 792bae8 to 6f99ddc Compare March 12, 2025 23:21
continue;
}

$resolved[] = [$type, $char];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this handles any "left over tokens" - including T_ASTERISK - maybe all tokens should be handled explicitly?

@dantleech dantleech force-pushed the gh-6114-source-map-no-fs branch from ba1d99d to 6fc5179 Compare March 13, 2025 11:56
self::T_BRACKET_CLOSE => ']',
self::T_HYPHEN => '-',
self::T_COLON => ':',
self::T_BACKSLASH => '\\',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COLON and BACKSLASH are not tested

@dantleech dantleech force-pushed the gh-6114-source-map-no-fs branch from 6fc5179 to e07a96f Compare March 15, 2025 19:55
@dantleech
Copy link
Contributor Author

@sebastianbergmann this PR is indicative of the approach but needs the final 20% of work:

  • Refactor the source filter/mapper tests to create the paths they need in a temporary directory? this would make it easier to test edge cases without contaminating other tests and grant mode confidence.
  • Implement the exclude/include logic (either by compiling that into the regex or as a separate concern).
  • We can implement or forbid the use of [[:character-ranges:]] and other "esoteric" glob patterns.

But I'm keen for any thought you have on the approach so far - the biggest concern for me is providing confidence that it will be compatible with at least 99.99% of configuration files.

}

foreach ($this->includeDirectoryRegexes as $directoryRegex) {
if ($directoryRegex->matches($path)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will need to change to account for include/exclude matching on the basename either in the regex or otherwise.

@sebastianbergmann
Copy link
Owner

@sebastianbergmann this PR is indicative of the approach but needs the final 20% of work:

* Refactor the source filter/mapper tests to create the paths they need in a temporary directory? this would make it easier to test edge cases without contaminating other tests and grant mode confidence.

* Implement the exclude/include logic (either by compiling that into the regex or as a separate concern).

* We can implement or forbid the use of `[[:character-ranges:]]` and other "esoteric" glob patterns.

Thank you for your work so far. I am afraid that I might not be able to look into this before the PHPUnit Code Sprint in April in Berlin. There I will be able to look into this, for sure, and together with other developers. I hope I can look into this before then, but I cannot promise that.

But I'm keen for any thought you have on the approach so far - the biggest concern for me is providing confidence that it will be compatible with at least 99.99% of configuration files.

We could delay this until PHPUnit 13 and break backward compatibility (for edge cases, "esoteric" glob patterns, etc.).

@sebastianbergmann
Copy link
Owner

Is it expected that the tests are currently failing?

@dantleech
Copy link
Contributor Author

dantleech commented Apr 3, 2025

Yes - the prefix/suffix filtering hasn't been added yet and the tests fail due to the tests we added upstream. The PR is at an inflection point:

  • Do we want to commit to supprting all the glob features? or could we simplify it.
  • Do we want to refactor the source-finding/filtering tests to be more dynamic?[1]
  • Are we happy with the general approach in this PR?

And then I was also considering how to implement the suffix/prefix logic - as it probably makes sense to include them in the compiled regex - but then it intrudes on the "copy the behavior of glob" approach" - so if we decide to simplify the globbing behavior it makes that decision easier too.

[1] currently all the source mapping/filtering tests share the same fixture which makes it hard to test edge cases without breaking other tests - I think it could generate the file tree instead.

@sebastianbergmann sebastianbergmann force-pushed the main branch 4 times, most recently from 1bb9097 to 5af6639 Compare April 20, 2025 07:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/test-runner CLI test runner type/performance Issues related to resource consumption (time and memory)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants