#359 open
Brian Ford

[TASK] Improve performance of String

Reported by Brian Ford | February 24th, 2008 @ 04:09 PM | in 1.0 preview

Bring the performance of String on par with MRI, or at most an OOM slower. In doing so, consider the following areas:

  1. the current methods available on ByteArray
  2. the use of primitives in String or ByteArray

The goal will be to lay a solid foundation for later M17N work. The direct indexing methods need to be optimized. Several strategies will be developed for supporting multi-byte characters, including a Char immediate and CharArray that can transparently be swapped in for a ByteArray without breaking any String methods. This will allow us to convert a String from using a ByteArray to using a CharArray on the fly. Also explore offering three methods of accessing characters from a ByteArray: 1. calculated (i.e. traverse each multi-byte character to find the offset to the next); 2. "indexed" (i.e. maintain a set of indexes to the character offsets); and 3. direct (i.e. convert a String instance from using a ByteArray to using a CharArray, in which case accessing a character at an index becomes an O(1) operation at the expense of more memory used for the String).

This is a macro ticket and subtickets may be created as appropriate.

Comments and changes to this ticket

  • Ryan Davis

    Ryan Davis February 29th, 2008 @ 04:07 PM

    • → State changed from “new” to “open”
  • Brian Ford

    Brian Ford March 25th, 2008 @ 11:12 AM

    Some comparisons as of 52d81e05 (against c39f2cb7).

    Method                            before      after          ratio
    String#upto                       0.097594    0.089297       0.914985
    String#%                          0.264952    0.237771       0.897412
    String#rstrip                     0.001074    0.000954       0.888268
    String#rstrip!                    0.000803    0.000713       0.887920
    String#dup                        0.000119    0.000104       0.873950
    String#split(string, limit)       0.001435    0.001242       0.865505
    String#chomp!                     0.007827    0.006749       0.862272
    String#lstrip!                    0.000760    0.000649       0.853947
    String#index(string, offset)      0.018572    0.015428       0.830713
    String#to_s                       0.000393    0.000321       0.816794
    String#===                        0.000106    0.000086       0.811321
    String#downcase                   0.007860    0.005034       0.640458
    String#upcase                     0.093755    0.052783       0.562989
    String#swapcase                   0.105811    0.059537       0.562673
    String#swapcase!                  0.105588    0.058790       0.556787
    String#upcase!                    0.093324    0.050050       0.536304
    String#count(string)              0.114852    0.055853       0.486304
    String#downcase!                  0.075648    0.036337       0.480343
    String#capitalize!                0.080687    0.036927       0.457657
    String#capitalize                 0.091156    0.037381       0.410077
    String#*                          0.004540    0.001503       0.331057
    String#count(^string)             0.238026    0.058308       0.244965
    String#ljust                      0.027946    0.006244       0.223431
    String#delete!                    0.104976    0.020557       0.195826
    String#rjust(width)               0.034383    0.006029       0.175348
    String#delete                     0.133950    0.023373       0.174490
    String#tr                         0.207756    0.034218       0.164703
    String#center                     0.061345    0.009877       0.161007
    String#tr!                        0.212180    0.033515       0.157956
    String#squeeze('m-n')             1.077389    0.116340       0.107983
    String#squeeze!('m-n')            1.072962    0.114569       0.106778
    String#rjust(width, padding)      0.018185    0.001687       0.092769
    String#squeeze(char)              0.712580    0.038169       0.053565
    String#squeeze!(char)             0.738198    0.037387       0.050646
    String#tr_s                       0.938404    0.044736       0.047672
    String#tr_s!                      0.938629    0.043725       0.046584
    String#squeeze!                   0.687785    0.027225       0.039584
    String#squeeze                    0.744735    0.027903       0.037467
    String#casecmp                    0.188328    0.001463       0.007768
    

    I've observed benchmarks/yarv/bm_so_count_words.rb times go from ~18 sec to ~1.9 sec.

    There are a significant number of methods still to review, so I'm leaving the ticket open.

  • Brian Ford

    Brian Ford March 27th, 2008 @ 11:15 AM

    This is a recent status comparing 3145a74a8 (current) with c39f2cb7 (old).

    Method                              Before       After          Ratio
    String#==                         0.000083    0.000094       1.132530
    base                              0.000378    0.000401       1.060800
    String#to_i(base)                 0.009384    0.009907       1.055733
    String#oct                        0.013846    0.014428       1.042034
    String#sum(int)                   0.032162    0.033507       1.041820
    String#[int]                      0.012991    0.013376       1.029636
    String#strip!                     0.001795    0.001846       1.028412
    String#hex                        0.016305    0.016754       1.027538
    String#to_i                       0.009677    0.009890       1.022011
    String#reverse                    0.018092    0.018457       1.020175
    String#index(int, offset)         0.009583    0.009750       1.017427
    String#each_byte                  0.025686    0.026100       1.016118
    String#index(int)                 0.086766    0.087524       1.008736
    String#reverse!                   0.018001    0.018073       1.004000
    String#intern                     0.260485    0.261459       1.003739
    String#sum                        0.032523    0.032460       0.998063
    String#crypt                      0.016990    0.016936       0.996822
    String#slice(int)                 0.002615    0.002598       0.993499
    String#each                       0.334548    0.331547       0.991030
    String#to_sym                     0.260142    0.257554       0.990052
    String#slice(regexp)              0.006973    0.006861       0.983938
    String#=~                         0.004934    0.004848       0.982570
    String#dump                       0.081199    0.079687       0.981379
    String#replace                    0.001790    0.001742       0.973184
    String#to_f                       0.000360    0.000350       0.972222
    String#match(regexp)              0.004797    0.004650       0.969356
    String#match(string)              0.009942    0.009618       0.967411
    String#<<                         0.007967    0.007693       0.965608
    String#rindex(regexp)             0.010973    0.010586       0.964732
    String#rindex(regexp, int)        0.011603    0.011110       0.957511
    String#index(string)              0.041252    0.039367       0.954305
    String#<=>                        0.000523    0.000498       0.952199
    String#+                          0.002974    0.002820       0.948218
    String#succ                       0.003482    0.003284       0.943136
    String#to_str                     0.000341    0.000320       0.938416
    String.new                        0.004876    0.004573       0.937859
    String#rindex(int, int)           0.038003    0.034843       0.916849
    String#rstrip!                    0.000803    0.000735       0.915318
    String#unpack                     0.114886    0.104490       0.909510
    String#rindex(string)             0.032481    0.029499       0.908192
    String#succ!                      0.003379    0.003037       0.898787
    String#rstrip                     0.001074    0.000963       0.896648
    String#split(regexp, limit)       0.068813    0.061321       0.891125
    String#upto                       0.097594    0.085998       0.881181
    String#split(regexp)              0.427613    0.375902       0.879071
    String#split                      0.319598    0.280881       0.878857
    String#sub!(pattern) { block }    0.036468    0.032030       0.878304
    String#split(string)              0.241152    0.210947       0.874747
    String#rindex(string, int)        0.032223    0.028124       0.872793
    String#split(string, limit)       0.001435    0.001239       0.863415
    String#index(regexp)              0.018529    0.015973       0.862054
    String#lstrip!                    0.000760    0.000655       0.861842
    String#index(regexp, offset)      0.017691    0.015055       0.850998
    String#[string]                   0.066639    0.056664       0.850313
    String#empty?                     0.000146    0.000124       0.849315
    String#slice(string)              0.066779    0.056373       0.844173
    String#include?                   0.065152    0.054802       0.841141
    String#%                          0.264952    0.220227       0.831196
    String#to_s                       0.000393    0.000325       0.826972
    String#sub!(pattern, repl)        0.035746    0.029545       0.826526
    String#[regexp]                   0.014063    0.011520       0.819171
    String#scan { block }             0.253983    0.207872       0.818448
    String#sub(pattern) { block }     0.027004    0.022085       0.817842
    String#scan                       0.206328    0.168362       0.815992
    String#[regexp,int]               0.014330    0.011688       0.815632
    String#===                        0.000106    0.000086       0.811321
    String#gsub(regexp, repl)         0.157000    0.124823       0.795051
    String#gsub(regexp) { block }     2.152295    1.710988       0.794960
    String#gsub!(regexp) { block }    2.247742    1.784612       0.793958
    String#eql?                       0.000070    0.000055       0.785714
    String#sub(pattern, repl)         0.028269    0.021743       0.769146
    String#[range]                    0.008446    0.006439       0.762373
    String#length                     0.000041    0.000031       0.756098
    String#dup                        0.000119    0.000088       0.739496
    String#slice(range)               0.008944    0.006423       0.718135
    String#index(string, offset)      0.018572    0.013318       0.717101
    String#lstrip                     0.008438    0.005671       0.672079
    String#downcase                   0.007860    0.005246       0.667430
    String#slice(int, int)            0.004752    0.002920       0.614478
    String#[int,int]                  0.004783    0.002930       0.612586
    String#chomp                      0.007325    0.004155       0.567235
    String#swapcase                   0.105811    0.058019       0.548327
    String#upcase                     0.093755    0.050663       0.540377
    String#upcase!                    0.093324    0.049483       0.530228
    String#swapcase!                  0.105588    0.055842       0.528867
    String#chomp!                     0.007827    0.004040       0.516162
    String#count(string)              0.114852    0.057045       0.496683
    String#downcase!                  0.075648    0.034743       0.459272
    String#capitalize!                0.080687    0.037053       0.459219
    String#strip                      0.013780    0.006056       0.439478
    String#capitalize                 0.091156    0.037206       0.408157
    String#chop                       0.005374    0.001887       0.351135
    String#*                          0.004540    0.001513       0.333260
    String#chop!                      0.005357    0.001678       0.313235
    String#insert                     0.019151    0.005710       0.298157
    String#count(^string)             0.238026    0.060587       0.254539
    String#ljust                      0.027946    0.006121       0.219030
    String#delete!                    0.104976    0.021045       0.200474
    String#delete                     0.133950    0.023793       0.177626
    String#rjust(width)               0.034383    0.006009       0.174767
    String#center                     0.061345    0.009903       0.161431
    String#tr                         0.207756    0.032670       0.157252
    String#tr!                        0.212180    0.032498       0.153162
    String#squeeze('m-n')             1.077389    0.119240       0.110675
    String#squeeze!('m-n')            1.072962    0.117104       0.109141
    String#rjust(width, padding)      0.018185    0.001605       0.088260
    String#squeeze(char)              0.712580    0.040846       0.057321
    String#squeeze!(char)             0.738198    0.040276       0.054560
    String#tr_s                       0.938404    0.042907       0.045723
    String#tr_s!                      0.938629    0.042761       0.045557
    String#squeeze!                   0.687785    0.030077       0.043730
    String#squeeze                    0.744735    0.030585       0.041068
    String#casecmp                    0.188328    0.001560       0.008283
    

Please Login or create a free account to add a new comment.

You can update this ticket by sending an email to from your email client. (help)

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

People watching this ticket

Tags