(71B) Wildcard pattern matching, directory listing
|
04-16-2021, 02:49 PM
(This post was last modified: 04-16-2021 11:35 PM by Dave Britten.)
Post: #1
|
|||
|
|||
(71B) Wildcard pattern matching, directory listing
EDIT: Added a couple of bug fixes, and incorporated Rob's suggestion regarding the quick-abort behavior.
This isn't so much a standalone program, but a routine that can be used for simple wildcard pattern matching of strings, similar to DOS/Windows directory listings (a*.txt, test?.dat, etc.) There are two callable subprograms here, one for comparing a string to a pattern containing any number of wildcard characters, and another that mimics CAT ALL, but allows for filtering the results with wildcards. Pattern Matching CALL MATCH(S$,P$,M) S$ contains the string to test, P$ contains the pattern to test against, and M is an output that will be 1 indicating a match, or 0 indicating not a match. Available wildcard characters: ? - Match any one character * - Match any zero or more characters Any number of wildcards can appear in any quantity. MATCH will automatically trim all trailing spaces off of both the input string and the pattern. Leading spaces, or those within the two arguments will be left in place and compared like any other character. Comparisons are case sensitive; if you want a case-insensitive comparison, simply use the built-in function UPRC$() on the arguments to MATCH. Examples: CALL MATCH("AAABBBCCC","A*B*C*"): 1 CALL MATCH("ABCDEFGHIJ", "AB*F?H*"): 1 CALL MATCH("ABCDEFGHIJKLM", "*DF*L?"): 0 You can also simply RUN the program if you want to test a single input string and pattern. Directory Listing CALL DIR(F$,T$) Where F$ is the pattern to apply to the filename, and T$ is the pattern to apply to the type string. Examples: CALL DIR("A*","BASIC"): List of all BASIC files beginning with A CALL DIR("*S","*"): List of all files ending in S Technical Info The algorithm is relatively simple. It compares the input string to the pattern one character at a time. A ? in the pattern is automatically considered a match against the current character. If * is encountered, it moves the pointer in the pattern to the next character, and starts recursively calling the matching function (FNM), incrementing the pointer to the input string one character at a time until it either finds a match or gives up. There are some assorted optimizations included, like immediately returning 1 if the entire input pattern is a single *, attempting to match with a simple string=string comparison before comparing one character at a time and handling wildcards, immediately comparing at the end of the input string when encountering * and there are no further * in the pattern, etc. Code: 0001 DIM S$[100],P$[100] MATCH.zip (Size: 817 bytes / Downloads: 3) |
|||
04-16-2021, 07:35 PM
Post: #2
|
|||
|
|||
RE: (71B) Wildcard pattern matching, directory listing
(04-16-2021 02:49 PM)Dave Britten Wrote: The algorithm is relatively simple. It compares the input string to the pattern one character at a time. A ? in the pattern is automatically considered a match against the current character. If * is encountered, it moves the pointer in the pattern to the next character, and starts recursively calling the matching function (FNM), incrementing the pointer to the input string one character at a time until it either finds a match or gives up. Recursion is not needed and can result in exponential blow-up of the matching time. For details see my Code Project post why blow-up happens and how to optimize to match * and ? wildcards and [a-z] ranges efficiently. Hope this helps! - Rob "I count on old friends to remain rational" |
|||
04-16-2021, 10:14 PM
Post: #3
|
|||
|
|||
RE: (71B) Wildcard pattern matching, directory listing
Thanks, I'll give it a look. Surprisingly, I didn't see a ton of articles diving into the problem in my quick search. This algorithm seemed like a reasonable balance of small size and performance for the kind of problems you'd likely throw at it on a 71B. (I don't expect to need to match patterns with 10 or more asterisks in them!) I'll see if I can work in the quick-abort logic.
|
|||
04-17-2021, 01:55 AM
Post: #4
|
|||
|
|||
RE: (71B) Wildcard pattern matching, directory listing
Seems to me that you could step through any real world file list just using CAT and [v] in less keystrokes than invoking that CALL MATCH... command line, lol.
Still, it looks interesting and I'll likely give it a whirl, just for the experience. Thanks for sharing this Dave. --Bob Prosperi |
|||
04-17-2021, 03:42 AM
Post: #5
|
|||
|
|||
RE: (71B) Wildcard pattern matching, directory listing
(04-17-2021 01:55 AM)rprosperi Wrote: Seems to me that you could step through any real world file list just using CAT and [v] in less keystrokes than invoking that CALL MATCH... command line, lol. Yeah, the DIR sub is just one very simple example with questionable utility, it can have other more meaningful uses. Might be useful for reading and filtering lines out of a text file, for instance. I don't know that I'd attempt to make a whole regex engine in BASIC, but I might add some other features to this. The ability to match character classes as with the LIKE operator in SQL Server would be neat, e.g. '[0-9][0-9]*' to match strings beginning with a pair of numbers. Could just expand the class into a string that's tested with POS() or something. |
|||
« Next Oldest | Next Newest »
|
User(s) browsing this thread: 1 Guest(s)