Post Reply 
(71B) Wildcard pattern matching, directory listing
04-16-2021, 02:49 PM (This post was last modified: 04-16-2021 11:35 PM by Dave Britten.)
Post: #1
(71B) Wildcard pattern matching, directory listing
EDIT: Added a couple of bug fixes, and incorporated Rob's suggestion regarding the quick-abort behavior.

This isn't so much a standalone program, but a routine that can be used for simple wildcard pattern matching of strings, similar to DOS/Windows directory listings (a*.txt, test?.dat, etc.)

There are two callable subprograms here, one for comparing a string to a pattern containing any number of wildcard characters, and another that mimics CAT ALL, but allows for filtering the results with wildcards.

Pattern Matching

CALL MATCH(S$,P$,M)

S$ contains the string to test, P$ contains the pattern to test against, and M is an output that will be 1 indicating a match, or 0 indicating not a match.

Available wildcard characters:

? - Match any one character
* - Match any zero or more characters

Any number of wildcards can appear in any quantity. MATCH will automatically trim all trailing spaces off of both the input string and the pattern. Leading spaces, or those within the two arguments will be left in place and compared like any other character. Comparisons are case sensitive; if you want a case-insensitive comparison, simply use the built-in function UPRC$() on the arguments to MATCH.

Examples:

CALL MATCH("AAABBBCCC","A*B*C*"): 1
CALL MATCH("ABCDEFGHIJ", "AB*F?H*"): 1
CALL MATCH("ABCDEFGHIJKLM", "*DF*L?"): 0

You can also simply RUN the program if you want to test a single input string and pattern.

Directory Listing

CALL DIR(F$,T$)

Where F$ is the pattern to apply to the filename, and T$ is the pattern to apply to the type string.

Examples:

CALL DIR("A*","BASIC"): List of all BASIC files beginning with A
CALL DIR("*S","*"): List of all files ending in S

Technical Info

The algorithm is relatively simple. It compares the input string to the pattern one character at a time. A ? in the pattern is automatically considered a match against the current character. If * is encountered, it moves the pointer in the pattern to the next character, and starts recursively calling the matching function (FNM), incrementing the pointer to the input string one character at a time until it either finds a match or gives up.

There are some assorted optimizations included, like immediately returning 1 if the entire input pattern is a single *, attempting to match with a simple string=string comparison before comparing one character at a time and handling wildcards, immediately comparing at the end of the input string when encountering * and there are no further * in the pattern, etc.

Code:
0001 DIM S$[100],P$[100]
0002 INPUT "STRING? ";S$
0003 INPUT "PATTERN? ";P$
0004 CALL MATCH(S$,P$,M)
0005 DISP M @ END 
0100 SUB MATCH(S$,P$,M)
0110 DIM R
0120 DEF FNM(S1,P1)
0130 IF S$[S1,L1]=P$[P1,L2] THEN FNM=1 @ STOP 
0140 IF S1>L1 AND P1>L2 THEN FNM=1 @ STOP 
0150 IF S1>L1 OR P1>L2 THEN FNM=0 @ STOP 
0160 IF P$[P1,P1]="?" THEN GOTO 190
0170 IF P$[P1,P1]="*" THEN GOTO 200
0180 IF S$[S1,S1]#P$[P1,P1] THEN FNM=0 @ STOP 
0190 S1=S1+1 @ P1=P1+1 @ GOTO 140
0200 P1=P1+1 @ IF P$[P1,P1]="*" THEN GOTO 200
0210 IF P1>L2 THEN FNM=1 @ STOP 
0220 IF POS(P$[P1],"*")=0 THEN S1=L1-(L2-P1)
0230 R=FNM(S1,P1) @ IF R#0 THEN GOTO 250
0240 S1=S1+1 @ IF S1<=L1 THEN GOTO 230
0250 IF R#1 THEN FNM=-1 @ STOP 
0260 FNM=S1<=L1
0270 END DEF 
0280 L1=LEN(S$) @ L2=LEN(P$)
0290 IF S$[L1,L1]=" " THEN L1=L1-1 @ GOTO 290
0300 IF P$[L2,L2]=" " THEN L2=L2-1 @ GOTO 300
0310 M=FNM(1,1)=1
0320 END SUB 
0330 SUB DIR(F$,T$)
0340 DIM C$[43],I
0350 I=1
0360 C$=CAT$(I)
0370 IF C$="" THEN STOP 
0380 CALL MATCH(C$[1,8],F$,M1)
0390 CALL MATCH(C$[12,16],T$,M2)
0400 IF M1 AND M2 THEN DISP C$
0410 I=I+1 @ GOTO 360
0420 END SUB


.zip  MATCH.zip (Size: 817 bytes / Downloads: 3)
Visit this user's website Find all posts by this user
Quote this message in a reply
04-16-2021, 07:35 PM
Post: #2
RE: (71B) Wildcard pattern matching, directory listing
(04-16-2021 02:49 PM)Dave Britten Wrote:  The algorithm is relatively simple. It compares the input string to the pattern one character at a time. A ? in the pattern is automatically considered a match against the current character. If * is encountered, it moves the pointer in the pattern to the next character, and starts recursively calling the matching function (FNM), incrementing the pointer to the input string one character at a time until it either finds a match or gives up.

Recursion is not needed and can result in exponential blow-up of the matching time. For details see my Code Project post why blow-up happens and how to optimize to match * and ? wildcards and [a-z] ranges efficiently. Hope this helps!

- Rob

"I count on old friends" -- HP 71B,Prime|Ti VOY200,Nspire CXII CAS|Casio fx-CG50...|Sharp PC-G850,E500,2500,1500,14xx,13xx,12xx...
Visit this user's website Find all posts by this user
Quote this message in a reply
04-16-2021, 10:14 PM
Post: #3
RE: (71B) Wildcard pattern matching, directory listing
Thanks, I'll give it a look. Surprisingly, I didn't see a ton of articles diving into the problem in my quick search. This algorithm seemed like a reasonable balance of small size and performance for the kind of problems you'd likely throw at it on a 71B. (I don't expect to need to match patterns with 10 or more asterisks in them!) I'll see if I can work in the quick-abort logic.
Visit this user's website Find all posts by this user
Quote this message in a reply
04-17-2021, 01:55 AM
Post: #4
RE: (71B) Wildcard pattern matching, directory listing
Seems to me that you could step through any real world file list just using CAT and [v] in less keystrokes than invoking that CALL MATCH... command line, lol.

Still, it looks interesting and I'll likely give it a whirl, just for the experience.

Thanks for sharing this Dave. Smile

--Bob Prosperi
Find all posts by this user
Quote this message in a reply
04-17-2021, 03:42 AM
Post: #5
RE: (71B) Wildcard pattern matching, directory listing
(04-17-2021 01:55 AM)rprosperi Wrote:  Seems to me that you could step through any real world file list just using CAT and [v] in less keystrokes than invoking that CALL MATCH... command line, lol.

Still, it looks interesting and I'll likely give it a whirl, just for the experience.

Thanks for sharing this Dave. Smile

Yeah, the DIR sub is just one very simple example with questionable utility, it can have other more meaningful uses. Smile Might be useful for reading and filtering lines out of a text file, for instance.

I don't know that I'd attempt to make a whole regex engine in BASIC, but I might add some other features to this. The ability to match character classes as with the LIKE operator in SQL Server would be neat, e.g. '[0-9][0-9]*' to match strings beginning with a pair of numbers. Could just expand the class into a string that's tested with POS() or something.
Visit this user's website Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 1 Guest(s)