Report 1598 Actions
Problem Report Number |
1598 |
Submitter's Classification |
Test Suite problem |
State |
Resolved |
Resolution |
Permanent Interpretation (PIN) |
Problem Resolution ID |
PIN.X.0155 |
Raised |
1996-01-09 08:00 |
Updated |
2003-03-13 08:00 |
Published |
1996-03-19 08:00 |
Product Standard |
Internationalised System Calls and Libraries Extended (UNIX 95) |
Certification Program |
The Open Brand certification program |
Test Suite |
VSU version 4.1.0 |
Test Identification |
CAPI.os/genuts/regcmp 21 26 |
Specification |
System Interfaces and Libraries Issue 4 Version 2 |
Location in Spec |
See Problem Text |
Problem Summary |
PIN4U.00021 Does the test require ASCII? |
Problem Text |
The assertion fregcmp21(), constructs a range, [e1-e2], with hard coded endpoints, e1='\x40' and e2='\x41', which it compiles with regcmp(). It then uses regex() to test if the compiled pattern for this range matches a hard coded string, s1="\x4b". (Assertion fregcmp26() behave similarly, constructing the range [^]e1-e2])
On an ASCII platform e1='@', e2='A', and the range [@-A] does NOT match the string s1="K". On an EBCDIC platform e1=' ', e2='<no-break-space>', and the range [ -<no-break-space>] DOES match s1="." because <period> collates between <space> and <no-break-space>.
I believe it is incorrect for testcase fregcmp21() to use hard coded values e1='\x40', e2='\x41', and s1="\x4b", and to expect these hard coded values correspond to e1='@', e2='A', and s1="K" on all platforms. If fregcmp21() had used EBCDIC encodings for '@', 'A', and "K", i.e., e1='\x7C', e2='\xC1', and s1="\xD2", it would have gotten expected results on our EBCDIC system. On our system, regcmp() constructs a bitmap indicating all EBCDIC encoded characters which collate between any (EBCDIC encoded) range endpoints. We believe this is correct, and believe the test suite as written precludes porting of old programs which use regcmp() and regex() to EBCDIC platforms.
| The XSH regcmp spec states... | | Used within brackets, the hyphen signifies an ASCII character | range. For example, [a-z] is equivalent to [abcd ... xyz] . | | This clearly states that the character range is interpreted using the | ASCII sequence, the POSIX character set and locales have nothing to do | with this.
This is a misreading of the Specification. "ASCII character range" is not the same as "ASCII sequence", and is not meant to mean the character encodings but instead the _range_of_characters_ between the endpoints. Also, the example given backs this up, using _characters_ and not _encodings_. Also, there is no mention of ASCII _encodings_ anywhere in the text to support the VSU writer's viewpoint.
Our reading is backed by
XBD p.10, definition of "character" where it is called out as distinct from "byte" or "storage space".
XBD Sec. 4.1 p.40, which calls out the _only_ requirements on encoded values of characters in the portable character set.
XBD Sec 4.4 p 41, last paragraph, "This document set assumes that the portable character set is constant across all locales, but does not prohibit implementations from supporting two incompatible encodings, such as ASCII and EBCDIC."
The fact that the XSH is quite explicit in saying _encodings_ when it means _encodings_ and not _characters_: see XSH p.317 isascii() for an example.
The intent of the entire document set, which is to support _source_code_portability_, which is not possible under your reading, if your reading were applied to other parts of the Specification.
|
Test Output |
520|0 1 0 1 1|SPEC1170TESTSUITE CASE 21 520|0 1 0 1 2|A call to char *regcmp(const char 520|0 1 0 1 3|*string1, ...) shall interpret [A-Z] where A 520|0 1 0 1 4|represents the first character in the range and Z is 520|0 1 0 1 5|the last as a one-character regular expression 520|0 1 0 1 6|matching the range of consecutive ASCII characters 520|0 1 0 1 7|from A to Z inclusive. 520|0 1 851983 1 1|DEBUG: Entering capi_com/locale.c:set_POSIX_locale() 520|0 1 851983 1 2|PREP: Set POSIX locale 520|0 1 851983 1 3|DEBUG: Exiting capi_com/locale.c:set_POSIX_locale() 520|0 1 851983 1 4|TEST: For each character in range: 520|0 1 851983 1 5| Compiled pattern matches the characters in range 520|0 1 851983 1 6| Compiled pattern does not match other characters 520|0 1 851983 1 7|ERROR: Regular expression "[ -&]" matched "." 220|0 1 1 17:18:09|FAIL
520|0 1 0 1 1|SPEC1170TESTSUITE CASE 26 520|0 1 0 1 2|A call to char *regcmp(const char 520|0 1 0 1 3|*string1, ...) shall interpret [^]A-Z] where A is the 520|0 1 0 1 4|first character in a range and Z is the last as a 520|0 1 0 1 5|one-character regular expression matching all 520|0 1 0 1 6|characters except a right square bracket and the range 520|0 1 0 1 7|of consecutive ASCII characters from A to Z inclusive. 520|0 1 36765722 1 1|DEBUG: Entering capi_com/locale.c:set_POSIX_locale() 520|0 1 36765722 1 2|PREP: Set POSIX locale 520|0 1 36765722 1 3|DEBUG: Exiting capi_com/locale.c:set_POSIX_locale() 520|0 1 36765722 1 4|TEST: For each character in range: 520|0 1 36765722 1 5| Compiled pattern matches the characters in range 520|0 1 36765722 1 6| Compiled pattern does not match other characters 520|0 1 36765722 1 7|ERROR: Regular expression "[^] -&]" did not match "." 220|0 1 1 17:20:21|FAIL
|
Review Information
Review Type |
TSMA Review |
Start Date |
null |
Completed |
null |
Status |
Complete |
Review Recommendation |
No Resolution Given |
Review Response |
We recommend this request be refused.
ASCII and EBCDIC are specific encodings of the portable character set.
We believe that by using "ASCII" instead of "portable character set" the standard is making reference to a specific portable character set encoding. So it is acceptable for the test to reference the ASCII encodings for characters in the portable character set and expect them to match those characters.
We do not believe that the reference to XBD p10 is related to the issue under consideration. This text is disassociating characters from limitations on their storage.
We do not believe that the references to XBD Sec. 4.1 p.40 or Sec 4.4 p 41 apply to the issue at hand. The ASCII encoding of the portable character set, not the more abstract portable character set itself, is being directly referenced by the regcmp() spec.
We do not believe that the XSH is explicit in using "ASCII encodings" vs "ASCII charaters" in any specific way. There are very few references to ASCII in the XSH standard: a64l(), l64a(), isascii(), re_comp(), regcmp(), strcasecmp(), toascii(). We believe all these references use ASCII to refer to a specific encoding of characters.
|
Review Type |
Expert Group Review |
Start Date |
null |
Completed |
null |
Status |
Complete |
Review Resolution |
No Resolution Given |
Review Conclusion |
See X/Open Base Resolution 1170/165. The Base WG has resolved that the specification is unclear, and the the intent was that ASCII encodings are not required, but that the range of characters and their collation sequence is required.
|
Review Type |
SA Review |
Start Date |
null |
Completed |
null |
Status |
Complete |
Review Resolution |
Permanent Interpretation (PIN) |
Review Conclusion |
A Permanent Interpretation is granted.
|
Problem Reporting System Options:
|