regexp should consider $KCODE
Reported by pluskid | February 7th, 2008 @ 03:46 PM | in 1.0 preview
The second optional argument of Regexp.new can be used to indicate the language/encoding.
The default behavior:
re = Regexp.new(".")
str = "中文" # or "\344\270\255\346\226\207" in utf-8 encoding
re.match(str)[0] # => "\344"
When specifying the language/encoding:
re = Regexp.new(".", nil, 'u')
str = "中文" # or "\344\270\255\346\226\207" in utf-8 encoding
re.match(str)[0] # => "\344\270\255"
However, there's a global variable $KCODE that indicate the current language/encoding. When set, the regexp should behavior according to this, thus:
$KCODE = 'u'
re = Regexp.new(".")
str = "中文" # or "\344\270\255\346\226\207" in utf-8 encoding
re.match(str)[0] # => "中" or "\344\270\255" in utf-8 encoding
Those are Ruby 1.8 behavior. Since Ruby 1.9 gains full Unicode support, the global variable $KCODE is no longer used. I think Rubinius is currently making capability mainly to Ruby 1.8, so this should be considered.
One way to fix this, I think, is to change the default value for the second optional argument (lang) of Regexp.new from "nil" to "$KCODE".
I don't know whether Rubinius and Ruby1.8 use the same regexp engine. But it seems that even though I set $KCODE to 'u' in Ruby1.8. The code
Regexp.new(".").inspect
will return "/./" but
Regexp.new(".", nil, "u").inspect
returns "/./u" . However, the "/./" can successfully match a multi-byte character when setting $KCODE to 'u', but fails in Rubinius. So I think maybe some better way is to patch the regexp engine to take care of the global variable instead of patch the Regexp.new method.
Comments and changes to this ticket
-
Ryan Davis March 1st, 2008 @ 01:26 AM
- → State changed from new to open
- → Assigned user changed from to Brian Ford
Please Login or create a free account to add a new comment.
You can update this ticket by sending an email to from your email client. (help)
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile »
