HomeAbout Us A-Z IndexSearch * Contact Us Register LoginPress Shop

The Open Brand -- Problem Reporting and Interpretations System


Problem Report 0678 Details

Help Show help | Quick Search | Submit a Test Suite Support Request | Click here to view your privileges

This page provides all information on Problem Report 0678.


Report 0678 Actions


    Problem Report Number 0678
    Submitter's Classification Specification problem
    State Resolved
    Resolution Permanent Interpretation (PIN)
    Problem Resolution ID PIN.X.0071
    Raised 1970-01-01 08:00
    Updated 2003-03-13 08:00
    Published 1997-08-25 08:00
    Product Standard Commands and Utilities V2 (UNIX 95)
    Certification Program The Open Brand certification program
    Test Suite VSC version 4.1.6
    Test Identification POSIX.cmd/expr 105
    Specification Commands and Utilities Issue 4 Version 2
    Location in Spec See Problem Text
    Problem Summary PIN4C.00043 This IR claims that a duplicated subexpression may return null for \1 after matching a non-null subexpression.
    Problem Text
    This interpretation request pertains to the first two of the three items
    tested in purpose #105 of the "expr" case in the "POSIX.cmd" section.


    In the first part of purpose #105, the test fails when the
    regular expression "\([a-c]*\)\{0,\}" is applied to the string
    "aabcaab".
    In this case, expr is expected to return the string corresponding to
    \1, since the pattern contains at least one subexpression. The
    implementation is matching the string in two iterations, with the
    final iteration matching the null string. It returns the empty string
    (with a newline appended). This is consistent with the rules for making
    the leftmost longest matches for subpatterns, and considering a null
    string to be longer than no match at all. It is also consistent with
    the specification for regcomp() where it states that the last match of
    a subexpression is reported when it participates in the match multiple
    times.

    The standard does not sufficiently clarify how duplicated
    subexpressions should be handled with respect to matching the null
    string. It does not expressly forbid or allow null iterations. This
    ambiguity is addressed in IEEE Interpretation #43, part 1 (December
    1994 edition), excerpted below. The question raised is:

    (1) Can a duplicated subexpression match the null string? If so,
    will the duplication be repeated until the expression does match
    the null string?

    Rationale - Although adjacent duplication symbols are illegal
    (for no apparent reason, particularly for EREs), \(x*\)* is a
    legal expression, in which the *s are not adjacent. This
    raises the question: How many times is the null string
    matched?

    Suppose that \(x*\)* were allowed. Does matching it to xxx
    yield \1 = xxx or \1 = null?

    [ ... ]

    The interpretation for IEEE Std 1003.2-1992:

    (1) The standard does not require the successful matching of a null
    string after a successful match on a non-null string for
    duplication symbols. However, the standard does not clearly say
    that you are not allowed to match null strings after a
    successful match. Therefore, the standard is ambiguous.

    The definition of a back reference expression in 2.8.3.3 does
    not specify which match of a subexpression followed by a
    duplication symbol is to be returned. Note, however, that the
    definition regcomp() defined in B.5.1, lines 344-346 on page
    728, indicates that the back reference expression will match
    the last string matched by the subexpression.

    The standard is unclear on this issue, and no conformance
    distinction can be made between alternative implementations
    based on this. This is being referred to the sponsor.

    With respect to the standard, the implementation is valid; however,
    the test suite does not consider this alternate implementation.
    The implementation adheres to historic practice and appends a newline
    to the output because the matched string is a null string, generating
    an empty line. This behavior is covered by waiver TIN4C.00019.

    In the second part of assertion #105 the implementation adheres to
    traditional practice and outputs an empty line, consistent with
    waiver TIN4C.00019.

    We feel that a waiver should be granted.
    Test Output
    ******************************************************************

    /tset/POSIX.cmd/expr/expr.ex 1 Failed



    Test Information:
    Assertion #105 (A): GA142
    Note: An IEEE POSIX.2 interpretation request is pending on
    'nothing will be written to indicate a null string'.
    Testing regular expression "\([a-c]*\)\{0,\}"
    expected exit status 0, got 1
    Contents of regb_err:
    output differed from expected
    Expect output: "aabcaab"
    Actual output: ""
    Testing regular expression "\([a-c]*\)\{2,\}"
    output differed from expected
    Expect output: ""
    Actual output: ""
    Testing regular expression "\(a\{1,5100\}\)"


    ******************************************************************

    Review Information

    Review Type TSMA Review
    Start Date null
    Completed null
    Status Complete
    Review Recommendation No Resolution Given
    Review Response
    We recommend this request be refused.

    We do not believe that the referenced interpretation applies to the
    test in question.

    Underlying all regular expression matches is the following statement
    in section 7.1 of the XBD,

    If the pattern permits a variable number of matching
    characters and thus there is more than one such sequence
    starting at that point, the longest such sequence will be
    matched.

    The example given confirms this.

    For example: the BRE bb* matches the second to fourth
    characters of abbbc ...

    A null subexpression match is not considered a match in this case.
    Analogously, given [a-c]* for the the pattern aabcaab, the longest
    string clearly matches aabcaab. There are no secondary matches to
    consider, since the end of the string has been reached.

    This longest expression rule is reinforced in the following
    statement.

    Consistent with the whole match being the longest of
    the leftmost matches, each subpattern, from left to
    right, matches the longest possible string. For this
    purpose, a null string is considered to be longer than
    no match at all.

    A null string is only considered when it contributes to the entire
    regular expression match length. If the expression was \(e*\)\([a-c]*\)
    for aabcaab, then the subexpression e* would be null to allow the regular
    expression match to be the longest possible. So, the IEEE ruling may
    be true in concept, but applies rarely in practice since the longest
    match rule usually prevails.

    This is confirmed by the following statements.

    It is possible to determine what strings correspond to
    subexpressions by recursively applying the leftmost
    longest rule to each subexpression, but only with the
    proviso that the overall match is leftmost longest.

    [..]

    Conceptually, the implementation must examine every
    possible match and among those that yield the leftmost
    longest total matches, pick the one that does the
    longest match for the leftmost subexpression and so
    on.

    In the test of

    expr 'aabcaab' : '\([a-c]*\)\{0,\}'

    It is clear that the longest subexpression \1 must be returned given the
    choice between "aabcaab" and a NULL subexpression match.

    The second failure,

    Testing regular expression "\([a-c]*\)\{2,\}"
    output differed from expected
    Expect output: ""
    Actual output: ""

    is covered by the original TIN4C.00019.

    Review Type SA Review
    Start Date null
    Completed null
    Status Complete
    Review Resolution No Resolution Given
    Review Conclusion
    There are two problems in this request.
    The second of these is covered by the referenced TIN, TIN4C.00019.

    The first of these relates to an ambiguity in the underlying
    specification - that is not covered by TIN4C.00019.
    PASC interpretation request P1003.2-1992 #43 Question 12
    permits matching of a null expression after a successful match.

    It would be possible to recommend a PIN (and remove the reference to
    TIN4C.00019).

    However in view of the consultant's response below, the request should go
    for a 14 day anonymous review by the Base Working Group.

    Review Type Expert Group Review
    Start Date null
    Completed null
    Status Complete
    Review Resolution No Resolution Given
    Review Conclusion
    There are two problems in this request.
    The second of these is covered by the referenced TIN, TIN4C.00019.

    For the first:

    There is an ambiguity in the underlying specification. A permanent
    interpretation should be granted.

    Review Type SA Review
    Start Date null
    Completed null
    Status Complete
    Review Resolution Permanent Interpretation (PIN)
    Review Conclusion
    A Permanent Interpretation is granted.

    Problem Reporting System Options:

     

    Back   


Contact the Certification Authority