Report 0678 Actions
Problem Report Number |
0678 |
Submitter's Classification |
Specification problem |
State |
Resolved |
Resolution |
Permanent Interpretation (PIN) |
Problem Resolution ID |
PIN.X.0071 |
Raised |
1970-01-01 08:00 |
Updated |
2003-03-13 08:00 |
Published |
1997-08-25 08:00 |
Product Standard |
Commands and Utilities V2 (UNIX 95) |
Certification Program |
The Open Brand certification program |
Test Suite |
VSC version 4.1.6 |
Test Identification |
POSIX.cmd/expr 105 |
Specification |
Commands and Utilities Issue 4 Version 2 |
Location in Spec |
See Problem Text |
Problem Summary |
PIN4C.00043 This IR claims that a duplicated subexpression may return null for \1 after matching a non-null subexpression. |
Problem Text |
This interpretation request pertains to the first two of the three items tested in purpose #105 of the "expr" case in the "POSIX.cmd" section. In the first part of purpose #105, the test fails when the regular expression "\([a-c]*\)\{0,\}" is applied to the string "aabcaab". In this case, expr is expected to return the string corresponding to \1, since the pattern contains at least one subexpression. The implementation is matching the string in two iterations, with the final iteration matching the null string. It returns the empty string (with a newline appended). This is consistent with the rules for making the leftmost longest matches for subpatterns, and considering a null string to be longer than no match at all. It is also consistent with the specification for regcomp() where it states that the last match of a subexpression is reported when it participates in the match multiple times. The standard does not sufficiently clarify how duplicated subexpressions should be handled with respect to matching the null string. It does not expressly forbid or allow null iterations. This ambiguity is addressed in IEEE Interpretation #43, part 1 (December 1994 edition), excerpted below. The question raised is: (1) Can a duplicated subexpression match the null string? If so, will the duplication be repeated until the expression does match the null string? Rationale - Although adjacent duplication symbols are illegal (for no apparent reason, particularly for EREs), \(x*\)* is a legal expression, in which the *s are not adjacent. This raises the question: How many times is the null string matched? Suppose that \(x*\)* were allowed. Does matching it to xxx yield \1 = xxx or \1 = null? [ ... ] The interpretation for IEEE Std 1003.2-1992: (1) The standard does not require the successful matching of a null string after a successful match on a non-null string for duplication symbols. However, the standard does not clearly say that you are not allowed to match null strings after a successful match. Therefore, the standard is ambiguous. The definition of a back reference expression in 2.8.3.3 does not specify which match of a subexpression followed by a duplication symbol is to be returned. Note, however, that the definition regcomp() defined in B.5.1, lines 344-346 on page 728, indicates that the back reference expression will match the last string matched by the subexpression. The standard is unclear on this issue, and no conformance distinction can be made between alternative implementations based on this. This is being referred to the sponsor. With respect to the standard, the implementation is valid; however, the test suite does not consider this alternate implementation. The implementation adheres to historic practice and appends a newline to the output because the matched string is a null string, generating an empty line. This behavior is covered by waiver TIN4C.00019. In the second part of assertion #105 the implementation adheres to traditional practice and outputs an empty line, consistent with waiver TIN4C.00019. We feel that a waiver should be granted.
|
Test Output |
****************************************************************** /tset/POSIX.cmd/expr/expr.ex 1 Failed Test Information: Assertion #105 (A): GA142 Note: An IEEE POSIX.2 interpretation request is pending on 'nothing will be written to indicate a null string'. Testing regular expression "\([a-c]*\)\{0,\}" expected exit status 0, got 1 Contents of regb_err: output differed from expected Expect output: "aabcaab" Actual output: "" Testing regular expression "\([a-c]*\)\{2,\}" output differed from expected Expect output: "" Actual output: "" Testing regular expression "\(a\{1,5100\}\)" ******************************************************************
|
Review Information
Review Type |
TSMA Review |
Start Date |
null |
Completed |
null |
Status |
Complete |
Review Recommendation |
No Resolution Given |
Review Response |
We recommend this request be refused. We do not believe that the referenced interpretation applies to the test in question. Underlying all regular expression matches is the following statement in section 7.1 of the XBD, If the pattern permits a variable number of matching characters and thus there is more than one such sequence starting at that point, the longest such sequence will be matched. The example given confirms this. For example: the BRE bb* matches the second to fourth characters of abbbc ... A null subexpression match is not considered a match in this case. Analogously, given [a-c]* for the the pattern aabcaab, the longest string clearly matches aabcaab. There are no secondary matches to consider, since the end of the string has been reached. This longest expression rule is reinforced in the following statement. Consistent with the whole match being the longest of the leftmost matches, each subpattern, from left to right, matches the longest possible string. For this purpose, a null string is considered to be longer than no match at all. A null string is only considered when it contributes to the entire regular expression match length. If the expression was \(e*\)\([a-c]*\) for aabcaab, then the subexpression e* would be null to allow the regular expression match to be the longest possible. So, the IEEE ruling may be true in concept, but applies rarely in practice since the longest match rule usually prevails. This is confirmed by the following statements. It is possible to determine what strings correspond to subexpressions by recursively applying the leftmost longest rule to each subexpression, but only with the proviso that the overall match is leftmost longest. [..] Conceptually, the implementation must examine every possible match and among those that yield the leftmost longest total matches, pick the one that does the longest match for the leftmost subexpression and so on. In the test of expr 'aabcaab' : '\([a-c]*\)\{0,\}' It is clear that the longest subexpression \1 must be returned given the choice between "aabcaab" and a NULL subexpression match. The second failure, Testing regular expression "\([a-c]*\)\{2,\}" output differed from expected Expect output: "" Actual output: "" is covered by the original TIN4C.00019.
|
Review Type |
SA Review |
Start Date |
null |
Completed |
null |
Status |
Complete |
Review Resolution |
No Resolution Given |
Review Conclusion |
There are two problems in this request. The second of these is covered by the referenced TIN, TIN4C.00019. The first of these relates to an ambiguity in the underlying specification - that is not covered by TIN4C.00019. PASC interpretation request P1003.2-1992 #43 Question 12 permits matching of a null expression after a successful match. It would be possible to recommend a PIN (and remove the reference to TIN4C.00019). However in view of the consultant's response below, the request should go for a 14 day anonymous review by the Base Working Group.
|
Review Type |
Expert Group Review |
Start Date |
null |
Completed |
null |
Status |
Complete |
Review Resolution |
No Resolution Given |
Review Conclusion |
There are two problems in this request. The second of these is covered by the referenced TIN, TIN4C.00019. For the first: There is an ambiguity in the underlying specification. A permanent interpretation should be granted.
|
Review Type |
SA Review |
Start Date |
null |
Completed |
null |
Status |
Complete |
Review Resolution |
Permanent Interpretation (PIN) |
Review Conclusion |
A Permanent Interpretation is granted.
|
Problem Reporting System Options:
|