Home • About Us • A-Z Index • Search * Contact Us • Register • Login • Press • Shop

The Open Brand -- Problem Reporting and Interpretations System

Problem Report 1598 Details

Show help | Quick Search | Submit a Test Suite Support Request | Click here to view your privileges
This page provides all information on Problem Report 1598.

Report 1598 Actions

Problem Report Number 1598

Submitter's Classification Test Suite problem

State Resolved

Resolution Permanent Interpretation (PIN)

Problem Resolution ID PIN.X.0155

Raised 1996-01-09 08:00

Updated 2003-03-13 08:00

Published 1996-03-19 08:00

Product Standard Internationalised System Calls and Libraries Extended (UNIX 95)

Certification Program The Open Brand certification program

Test Suite VSU version 4.1.0

Test Identification CAPI.os/genuts/regcmp 21 26

Specification System Interfaces and Libraries Issue 4 Version 2

Location in Spec See Problem Text

Problem Summary PIN4U.00021 Does the test require ASCII?

Problem Text
The assertion fregcmp21(), constructs a range, [e1-e2], with
hard coded endpoints, e1='\x40' and e2='\x41', which it compiles
with regcmp(). It then uses regex() to test if the compiled
pattern for this range matches a hard coded string, s1="\x4b".
(Assertion fregcmp26() behave similarly, constructing the
range [^]e1-e2])

On an ASCII platform e1='@', e2='A', and the range [@-A] does
NOT match the string s1="K". On an EBCDIC platform e1=' ',
e2='<no-break-space>', and the range [ -<no-break-space>] DOES
match s1="." because <period> collates between <space> and
<no-break-space>.

I believe it is incorrect for testcase fregcmp21() to use hard
coded values e1='\x40', e2='\x41', and s1="\x4b", and to expect
these hard coded values correspond to e1='@', e2='A', and s1="K"
on all platforms. If fregcmp21() had used EBCDIC
encodings for '@', 'A', and "K", i.e., e1='\x7C', e2='\xC1', and
s1="\xD2", it would have gotten expected results on our EBCDIC system.
On our system, regcmp() constructs a bitmap indicating all EBCDIC encoded
characters which collate between any (EBCDIC encoded) range
endpoints. We believe this is correct, and believe the test suite
as written precludes porting of old programs which use regcmp() and
regex() to EBCDIC platforms.

| The XSH regcmp spec states...
|
| Used within brackets, the hyphen signifies an ASCII character
| range. For example, [a-z] is equivalent to [abcd ... xyz] .
|
| This clearly states that the character range is interpreted using the
| ASCII sequence, the POSIX character set and locales have nothing to do
| with this.

This is a misreading of the Specification. "ASCII character range" is
not the same as "ASCII sequence", and is not meant to mean the character
encodings but instead the _range_of_characters_ between the endpoints.
Also, the example given backs this up, using _characters_ and not
_encodings_. Also, there is no mention of ASCII _encodings_ anywhere in
the text to support the VSU writer's viewpoint.

Our reading is backed by

XBD p.10, definition of "character" where it is called out as
distinct from "byte" or "storage space".

XBD Sec. 4.1 p.40, which calls out the _only_ requirements on
encoded values of characters in the portable character set.

XBD Sec 4.4 p 41, last paragraph, "This document set assumes
that the portable character set is constant across all locales,
but does not prohibit implementations from supporting two
incompatible encodings, such as ASCII and EBCDIC."

The fact that the XSH is quite explicit in saying _encodings_
when it means _encodings_ and not _characters_: see XSH p.317
isascii() for an example.

The intent of the entire document set, which is to support
_source_code_portability_, which is not possible under your
reading, if your reading were applied to other parts of the
Specification.

Test Output
520|0 1 0 1 1|SPEC1170TESTSUITE CASE 21
520|0 1 0 1 2|A call to char *regcmp(const char
520|0 1 0 1 3|*string1, ...) shall interpret [A-Z] where A
520|0 1 0 1 4|represents the first character in the range and Z is
520|0 1 0 1 5|the last as a one-character regular expression
520|0 1 0 1 6|matching the range of consecutive ASCII characters
520|0 1 0 1 7|from A to Z inclusive.
520|0 1 851983 1 1|DEBUG: Entering capi_com/locale.c:set_POSIX_locale()
520|0 1 851983 1 2|PREP: Set POSIX locale
520|0 1 851983 1 3|DEBUG: Exiting capi_com/locale.c:set_POSIX_locale()
520|0 1 851983 1 4|TEST: For each character in range:
520|0 1 851983 1 5| Compiled pattern matches the characters in range
520|0 1 851983 1 6| Compiled pattern does not match other characters
520|0 1 851983 1 7|ERROR: Regular expression "[ -&]" matched "."
220|0 1 1 17:18:09|FAIL

520|0 1 0 1 1|SPEC1170TESTSUITE CASE 26
520|0 1 0 1 2|A call to char *regcmp(const char
520|0 1 0 1 3|*string1, ...) shall interpret [^]A-Z] where A is the
520|0 1 0 1 4|first character in a range and Z is the last as a
520|0 1 0 1 5|one-character regular expression matching all
520|0 1 0 1 6|characters except a right square bracket and the range
520|0 1 0 1 7|of consecutive ASCII characters from A to Z inclusive.
520|0 1 36765722 1 1|DEBUG: Entering capi_com/locale.c:set_POSIX_locale()
520|0 1 36765722 1 2|PREP: Set POSIX locale
520|0 1 36765722 1 3|DEBUG: Exiting capi_com/locale.c:set_POSIX_locale()
520|0 1 36765722 1 4|TEST: For each character in range:
520|0 1 36765722 1 5| Compiled pattern matches the characters in range
520|0 1 36765722 1 6| Compiled pattern does not match other characters
520|0 1 36765722 1 7|ERROR: Regular expression "[^] -&]" did not match "."
220|0 1 1 17:20:21|FAIL

Review Information

Review Type TSMA Review

Start Date null

Completed null

Status Complete

Review Recommendation No Resolution Given

Review Response
We recommend this request be refused.

ASCII and EBCDIC are specific encodings of the portable character set.

We believe that by using "ASCII" instead of "portable character
set" the standard is making reference to a specific portable
character set encoding. So it is acceptable for the test to
reference the ASCII encodings for characters in the portable
character set and expect them to match those characters.

We do not believe that the reference to XBD p10 is related to the
issue under consideration. This text is disassociating
characters from limitations on their storage.

We do not believe that the references to XBD Sec. 4.1 p.40 or
Sec 4.4 p 41 apply to the issue at hand. The ASCII encoding of
the portable character set, not the more abstract portable
character set itself, is being directly referenced by the regcmp()
spec.

We do not believe that the XSH is explicit in using "ASCII
encodings" vs "ASCII charaters" in any specific way. There are
very few references to ASCII in the XSH standard: a64l(), l64a(),
isascii(), re_comp(), regcmp(), strcasecmp(), toascii(). We
believe all these references use ASCII to refer to a specific
encoding of characters.

Review Type Expert Group Review

Start Date null

Completed null

Status Complete

Review Resolution No Resolution Given

Review Conclusion
See X/Open Base Resolution 1170/165. The Base WG has resolved that the
specification is unclear, and the the intent was that ASCII encodings are
not required, but that the range of characters and their collation sequence
is required.

Review Type SA Review

Start Date null

Completed null

Status Complete

Review Resolution Permanent Interpretation (PIN)

Review Conclusion
A Permanent Interpretation is granted.

Problem Reporting System Options:

View Report 1598

List All PRs

Search Reports

Email the System Administrator

View the The Open Brand Interpretations Database User Manual

Contact the Certification Authority