[TASK] Improve performance of String
Reported by Brian Ford | February 24th, 2008 @ 04:09 PM | in 1.0 preview
Bring the performance of String on par with MRI, or at most an OOM slower. In doing so, consider the following areas:
- the current methods available on ByteArray
- the use of primitives in String or ByteArray
The goal will be to lay a solid foundation for later M17N work. The direct indexing methods need to be optimized. Several strategies will be developed for supporting multi-byte characters, including a Char immediate and CharArray that can transparently be swapped in for a ByteArray without breaking any String methods. This will allow us to convert a String from using a ByteArray to using a CharArray on the fly. Also explore offering three methods of accessing characters from a ByteArray: 1. calculated (i.e. traverse each multi-byte character to find the offset to the next); 2. "indexed" (i.e. maintain a set of indexes to the character offsets); and 3. direct (i.e. convert a String instance from using a ByteArray to using a CharArray, in which case accessing a character at an index becomes an O(1) operation at the expense of more memory used for the String).
This is a macro ticket and subtickets may be created as appropriate.
Comments and changes to this ticket
-
Ryan Davis February 29th, 2008 @ 04:07 PM
- → State changed from new to open
-
Brian Ford March 25th, 2008 @ 11:12 AM
Some comparisons as of 52d81e05 (against c39f2cb7).
Method before after ratio String#upto 0.097594 0.089297 0.914985 String#% 0.264952 0.237771 0.897412 String#rstrip 0.001074 0.000954 0.888268 String#rstrip! 0.000803 0.000713 0.887920 String#dup 0.000119 0.000104 0.873950 String#split(string, limit) 0.001435 0.001242 0.865505 String#chomp! 0.007827 0.006749 0.862272 String#lstrip! 0.000760 0.000649 0.853947 String#index(string, offset) 0.018572 0.015428 0.830713 String#to_s 0.000393 0.000321 0.816794 String#=== 0.000106 0.000086 0.811321 String#downcase 0.007860 0.005034 0.640458 String#upcase 0.093755 0.052783 0.562989 String#swapcase 0.105811 0.059537 0.562673 String#swapcase! 0.105588 0.058790 0.556787 String#upcase! 0.093324 0.050050 0.536304 String#count(string) 0.114852 0.055853 0.486304 String#downcase! 0.075648 0.036337 0.480343 String#capitalize! 0.080687 0.036927 0.457657 String#capitalize 0.091156 0.037381 0.410077 String#* 0.004540 0.001503 0.331057 String#count(^string) 0.238026 0.058308 0.244965 String#ljust 0.027946 0.006244 0.223431 String#delete! 0.104976 0.020557 0.195826 String#rjust(width) 0.034383 0.006029 0.175348 String#delete 0.133950 0.023373 0.174490 String#tr 0.207756 0.034218 0.164703 String#center 0.061345 0.009877 0.161007 String#tr! 0.212180 0.033515 0.157956 String#squeeze('m-n') 1.077389 0.116340 0.107983 String#squeeze!('m-n') 1.072962 0.114569 0.106778 String#rjust(width, padding) 0.018185 0.001687 0.092769 String#squeeze(char) 0.712580 0.038169 0.053565 String#squeeze!(char) 0.738198 0.037387 0.050646 String#tr_s 0.938404 0.044736 0.047672 String#tr_s! 0.938629 0.043725 0.046584 String#squeeze! 0.687785 0.027225 0.039584 String#squeeze 0.744735 0.027903 0.037467 String#casecmp 0.188328 0.001463 0.007768I've observed benchmarks/yarv/bm_so_count_words.rb times go from ~18 sec to ~1.9 sec.
There are a significant number of methods still to review, so I'm leaving the ticket open.
-
Brian Ford March 27th, 2008 @ 11:15 AM
This is a recent status comparing 3145a74a8 (current) with c39f2cb7 (old).
Method Before After Ratio String#== 0.000083 0.000094 1.132530 base 0.000378 0.000401 1.060800 String#to_i(base) 0.009384 0.009907 1.055733 String#oct 0.013846 0.014428 1.042034 String#sum(int) 0.032162 0.033507 1.041820 String#[int] 0.012991 0.013376 1.029636 String#strip! 0.001795 0.001846 1.028412 String#hex 0.016305 0.016754 1.027538 String#to_i 0.009677 0.009890 1.022011 String#reverse 0.018092 0.018457 1.020175 String#index(int, offset) 0.009583 0.009750 1.017427 String#each_byte 0.025686 0.026100 1.016118 String#index(int) 0.086766 0.087524 1.008736 String#reverse! 0.018001 0.018073 1.004000 String#intern 0.260485 0.261459 1.003739 String#sum 0.032523 0.032460 0.998063 String#crypt 0.016990 0.016936 0.996822 String#slice(int) 0.002615 0.002598 0.993499 String#each 0.334548 0.331547 0.991030 String#to_sym 0.260142 0.257554 0.990052 String#slice(regexp) 0.006973 0.006861 0.983938 String#=~ 0.004934 0.004848 0.982570 String#dump 0.081199 0.079687 0.981379 String#replace 0.001790 0.001742 0.973184 String#to_f 0.000360 0.000350 0.972222 String#match(regexp) 0.004797 0.004650 0.969356 String#match(string) 0.009942 0.009618 0.967411 String#<< 0.007967 0.007693 0.965608 String#rindex(regexp) 0.010973 0.010586 0.964732 String#rindex(regexp, int) 0.011603 0.011110 0.957511 String#index(string) 0.041252 0.039367 0.954305 String#<=> 0.000523 0.000498 0.952199 String#+ 0.002974 0.002820 0.948218 String#succ 0.003482 0.003284 0.943136 String#to_str 0.000341 0.000320 0.938416 String.new 0.004876 0.004573 0.937859 String#rindex(int, int) 0.038003 0.034843 0.916849 String#rstrip! 0.000803 0.000735 0.915318 String#unpack 0.114886 0.104490 0.909510 String#rindex(string) 0.032481 0.029499 0.908192 String#succ! 0.003379 0.003037 0.898787 String#rstrip 0.001074 0.000963 0.896648 String#split(regexp, limit) 0.068813 0.061321 0.891125 String#upto 0.097594 0.085998 0.881181 String#split(regexp) 0.427613 0.375902 0.879071 String#split 0.319598 0.280881 0.878857 String#sub!(pattern) { block } 0.036468 0.032030 0.878304 String#split(string) 0.241152 0.210947 0.874747 String#rindex(string, int) 0.032223 0.028124 0.872793 String#split(string, limit) 0.001435 0.001239 0.863415 String#index(regexp) 0.018529 0.015973 0.862054 String#lstrip! 0.000760 0.000655 0.861842 String#index(regexp, offset) 0.017691 0.015055 0.850998 String#[string] 0.066639 0.056664 0.850313 String#empty? 0.000146 0.000124 0.849315 String#slice(string) 0.066779 0.056373 0.844173 String#include? 0.065152 0.054802 0.841141 String#% 0.264952 0.220227 0.831196 String#to_s 0.000393 0.000325 0.826972 String#sub!(pattern, repl) 0.035746 0.029545 0.826526 String#[regexp] 0.014063 0.011520 0.819171 String#scan { block } 0.253983 0.207872 0.818448 String#sub(pattern) { block } 0.027004 0.022085 0.817842 String#scan 0.206328 0.168362 0.815992 String#[regexp,int] 0.014330 0.011688 0.815632 String#=== 0.000106 0.000086 0.811321 String#gsub(regexp, repl) 0.157000 0.124823 0.795051 String#gsub(regexp) { block } 2.152295 1.710988 0.794960 String#gsub!(regexp) { block } 2.247742 1.784612 0.793958 String#eql? 0.000070 0.000055 0.785714 String#sub(pattern, repl) 0.028269 0.021743 0.769146 String#[range] 0.008446 0.006439 0.762373 String#length 0.000041 0.000031 0.756098 String#dup 0.000119 0.000088 0.739496 String#slice(range) 0.008944 0.006423 0.718135 String#index(string, offset) 0.018572 0.013318 0.717101 String#lstrip 0.008438 0.005671 0.672079 String#downcase 0.007860 0.005246 0.667430 String#slice(int, int) 0.004752 0.002920 0.614478 String#[int,int] 0.004783 0.002930 0.612586 String#chomp 0.007325 0.004155 0.567235 String#swapcase 0.105811 0.058019 0.548327 String#upcase 0.093755 0.050663 0.540377 String#upcase! 0.093324 0.049483 0.530228 String#swapcase! 0.105588 0.055842 0.528867 String#chomp! 0.007827 0.004040 0.516162 String#count(string) 0.114852 0.057045 0.496683 String#downcase! 0.075648 0.034743 0.459272 String#capitalize! 0.080687 0.037053 0.459219 String#strip 0.013780 0.006056 0.439478 String#capitalize 0.091156 0.037206 0.408157 String#chop 0.005374 0.001887 0.351135 String#* 0.004540 0.001513 0.333260 String#chop! 0.005357 0.001678 0.313235 String#insert 0.019151 0.005710 0.298157 String#count(^string) 0.238026 0.060587 0.254539 String#ljust 0.027946 0.006121 0.219030 String#delete! 0.104976 0.021045 0.200474 String#delete 0.133950 0.023793 0.177626 String#rjust(width) 0.034383 0.006009 0.174767 String#center 0.061345 0.009903 0.161431 String#tr 0.207756 0.032670 0.157252 String#tr! 0.212180 0.032498 0.153162 String#squeeze('m-n') 1.077389 0.119240 0.110675 String#squeeze!('m-n') 1.072962 0.117104 0.109141 String#rjust(width, padding) 0.018185 0.001605 0.088260 String#squeeze(char) 0.712580 0.040846 0.057321 String#squeeze!(char) 0.738198 0.040276 0.054560 String#tr_s 0.938404 0.042907 0.045723 String#tr_s! 0.938629 0.042761 0.045557 String#squeeze! 0.687785 0.030077 0.043730 String#squeeze 0.744735 0.030585 0.041068 String#casecmp 0.188328 0.001560 0.008283
Please Login or create a free account to add a new comment.
You can update this ticket by sending an email to from your email client. (help)
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile »
