HomeAbout Us A-Z IndexSearch * Contact Us Register LoginPress Shop

The Open Brand -- Problem Reporting and Interpretations System


Problem Report 1598 Details

Help Show help | Quick Search | Submit a Test Suite Support Request | Click here to view your privileges

This page provides all information on Problem Report 1598.


Report 1598 Actions


    Problem Report Number 1598
    Submitter's Classification Test Suite problem
    State Resolved
    Resolution Permanent Interpretation (PIN)
    Problem Resolution ID PIN.X.0155
    Raised 1996-01-09 08:00
    Updated 2003-03-13 08:00
    Published 1996-03-19 08:00
    Product Standard Internationalised System Calls and Libraries Extended (UNIX 95)
    Certification Program The Open Brand certification program
    Test Suite VSU version 4.1.0
    Test Identification CAPI.os/genuts/regcmp 21 26
    Specification System Interfaces and Libraries Issue 4 Version 2
    Location in Spec See Problem Text
    Problem Summary PIN4U.00021 Does the test require ASCII?
    Problem Text
    The assertion fregcmp21(), constructs a range, [e1-e2], with
    hard coded endpoints, e1='\x40' and e2='\x41', which it compiles
    with regcmp(). It then uses regex() to test if the compiled
    pattern for this range matches a hard coded string, s1="\x4b".
    (Assertion fregcmp26() behave similarly, constructing the
    range [^]e1-e2])

    On an ASCII platform e1='@', e2='A', and the range [@-A] does
    NOT match the string s1="K". On an EBCDIC platform e1=' ',
    e2='<no-break-space>', and the range [ -<no-break-space>] DOES
    match s1="." because <period> collates between <space> and
    <no-break-space>.

    I believe it is incorrect for testcase fregcmp21() to use hard
    coded values e1='\x40', e2='\x41', and s1="\x4b", and to expect
    these hard coded values correspond to e1='@', e2='A', and s1="K"
    on all platforms. If fregcmp21() had used EBCDIC
    encodings for '@', 'A', and "K", i.e., e1='\x7C', e2='\xC1', and
    s1="\xD2", it would have gotten expected results on our EBCDIC system.
    On our system, regcmp() constructs a bitmap indicating all EBCDIC encoded
    characters which collate between any (EBCDIC encoded) range
    endpoints. We believe this is correct, and believe the test suite
    as written precludes porting of old programs which use regcmp() and
    regex() to EBCDIC platforms.

    | The XSH regcmp spec states...
    |
    | Used within brackets, the hyphen signifies an ASCII character
    | range. For example, [a-z] is equivalent to [abcd ... xyz] .
    |
    | This clearly states that the character range is interpreted using the
    | ASCII sequence, the POSIX character set and locales have nothing to do
    | with this.

    This is a misreading of the Specification. "ASCII character range" is
    not the same as "ASCII sequence", and is not meant to mean the character
    encodings but instead the _range_of_characters_ between the endpoints.
    Also, the example given backs this up, using _characters_ and not
    _encodings_. Also, there is no mention of ASCII _encodings_ anywhere in
    the text to support the VSU writer's viewpoint.

    Our reading is backed by

    XBD p.10, definition of "character" where it is called out as
    distinct from "byte" or "storage space".

    XBD Sec. 4.1 p.40, which calls out the _only_ requirements on
    encoded values of characters in the portable character set.

    XBD Sec 4.4 p 41, last paragraph, "This document set assumes
    that the portable character set is constant across all locales,
    but does not prohibit implementations from supporting two
    incompatible encodings, such as ASCII and EBCDIC."

    The fact that the XSH is quite explicit in saying _encodings_
    when it means _encodings_ and not _characters_: see XSH p.317
    isascii() for an example.

    The intent of the entire document set, which is to support
    _source_code_portability_, which is not possible under your
    reading, if your reading were applied to other parts of the
    Specification.

    Test Output
    520|0 1 0 1 1|SPEC1170TESTSUITE CASE 21
    520|0 1 0 1 2|A call to char *regcmp(const char
    520|0 1 0 1 3|*string1, ...) shall interpret [A-Z] where A
    520|0 1 0 1 4|represents the first character in the range and Z is
    520|0 1 0 1 5|the last as a one-character regular expression
    520|0 1 0 1 6|matching the range of consecutive ASCII characters
    520|0 1 0 1 7|from A to Z inclusive.
    520|0 1 851983 1 1|DEBUG: Entering capi_com/locale.c:set_POSIX_locale()
    520|0 1 851983 1 2|PREP: Set POSIX locale
    520|0 1 851983 1 3|DEBUG: Exiting capi_com/locale.c:set_POSIX_locale()
    520|0 1 851983 1 4|TEST: For each character in range:
    520|0 1 851983 1 5| Compiled pattern matches the characters in range
    520|0 1 851983 1 6| Compiled pattern does not match other characters
    520|0 1 851983 1 7|ERROR: Regular expression "[ -&]" matched "."
    220|0 1 1 17:18:09|FAIL



    520|0 1 0 1 1|SPEC1170TESTSUITE CASE 26
    520|0 1 0 1 2|A call to char *regcmp(const char
    520|0 1 0 1 3|*string1, ...) shall interpret [^]A-Z] where A is the
    520|0 1 0 1 4|first character in a range and Z is the last as a
    520|0 1 0 1 5|one-character regular expression matching all
    520|0 1 0 1 6|characters except a right square bracket and the range
    520|0 1 0 1 7|of consecutive ASCII characters from A to Z inclusive.
    520|0 1 36765722 1 1|DEBUG: Entering capi_com/locale.c:set_POSIX_locale()
    520|0 1 36765722 1 2|PREP: Set POSIX locale
    520|0 1 36765722 1 3|DEBUG: Exiting capi_com/locale.c:set_POSIX_locale()
    520|0 1 36765722 1 4|TEST: For each character in range:
    520|0 1 36765722 1 5| Compiled pattern matches the characters in range
    520|0 1 36765722 1 6| Compiled pattern does not match other characters
    520|0 1 36765722 1 7|ERROR: Regular expression "[^] -&]" did not match "."
    220|0 1 1 17:20:21|FAIL

    Review Information

    Review Type TSMA Review
    Start Date null
    Completed null
    Status Complete
    Review Recommendation No Resolution Given
    Review Response
    We recommend this request be refused.

    ASCII and EBCDIC are specific encodings of the portable character set.

    We believe that by using "ASCII" instead of "portable character
    set" the standard is making reference to a specific portable
    character set encoding. So it is acceptable for the test to
    reference the ASCII encodings for characters in the portable
    character set and expect them to match those characters.

    We do not believe that the reference to XBD p10 is related to the
    issue under consideration. This text is disassociating
    characters from limitations on their storage.

    We do not believe that the references to XBD Sec. 4.1 p.40 or
    Sec 4.4 p 41 apply to the issue at hand. The ASCII encoding of
    the portable character set, not the more abstract portable
    character set itself, is being directly referenced by the regcmp()
    spec.

    We do not believe that the XSH is explicit in using "ASCII
    encodings" vs "ASCII charaters" in any specific way. There are
    very few references to ASCII in the XSH standard: a64l(), l64a(),
    isascii(), re_comp(), regcmp(), strcasecmp(), toascii(). We
    believe all these references use ASCII to refer to a specific
    encoding of characters.

    Review Type Expert Group Review
    Start Date null
    Completed null
    Status Complete
    Review Resolution No Resolution Given
    Review Conclusion
    See X/Open Base Resolution 1170/165. The Base WG has resolved that the
    specification is unclear, and the the intent was that ASCII encodings are
    not required, but that the range of characters and their collation sequence
    is required.

    Review Type SA Review
    Start Date null
    Completed null
    Status Complete
    Review Resolution Permanent Interpretation (PIN)
    Review Conclusion
    A Permanent Interpretation is granted.

    Problem Reporting System Options:

     

    Back   


Contact the Certification Authority