Post Reply 
Proposal : List of regular expressions for numeric syntax on different HP calculators
01-27-2020, 05:40 PM (This post was last modified: 01-27-2020 06:58 PM by Jonathan Busby.)
Post: #11
RE: List of regular expressions for numeric syntax on different HP calculators
Sorry for the belated reply.

(06-08-2019 05:50 AM)Paul Dale Wrote:  List all of the strings as the expression: string_1|string_2|...|string_n. A state minimisation is usually done as part of the finite automata construction. Immediate optimal matcher.

Well, actually, constructing a minimal DFA from a set of strings is NP-Complete and is "A.I.-hard" **if** the set of strings is part of a larger pattern that must be matched or inferred, otherwise, it's easy but the DFA will **only** match strings that it's seen in its "training set". It can be shown that it is equivalent to Grammar Induction.

The problem is what to do if the sets of strings do not generate the needed states for the optimal DFA in terms of a larger pattern space, if not all patterns have been seen -- there may be an infinite number of patterns that should match or not match but do anyways, at least without constraints.

I've seen the above problem being tackled with A.I. genetic programming and neural nets. For example, see here : http://regex.inginf.units.it/ . I tried, as training examples for the genetic algorithm, "abcdefghijklmnopqrstuvwxyz" and "0123456789" and got a regex of "[^\d ]++" . Although this matches the above two strings, it also matches garbage -- it would need a very huge training set to accurately match anything.

Quote:Well almost. Strictly, for a series of constant strings like this an entirely different algorithm based on a many in parallel Boyer-Moore style approach is used instead.

The only problem with the above is that, unless a slightly "fuzzy" search is used, only the strings seen so far will be matched. In the case of a DFA generated with grammar induction, it would be able to match strings from its inferred grammar, although, there may be many false positives and negatives, unless some more "fuzziness" is added, such as in agrep or Perl's "String::Approx" .

Regards,

Jonathan

Aeternitas modo est. Longa non est, paene nil.
Find all posts by this user
Quote this message in a reply
Post Reply 


Messages In This Thread
RE: List of regular expressions for numeric syntax on different HP calculators - Jonathan Busby - 01-27-2020 05:40 PM



User(s) browsing this thread: 1 Guest(s)