Linux x86_64 (amd64) SigSegV (float.rb)
Reported by Eero Saynatkari | December 8th, 2007 @ 12:29 PM | in 1.0
I have reports on various build-related problems on amd64 platforms. Please stop by #rubinius on freenode or report here. I will try to verify/reproduce and fix.
Comments and changes to this ticket
-

Thomas Lockney December 8th, 2007 @ 08:19 PM
Fresh checkout and build on Ubuntu seems to be broken:
tlockney@neuros:~/Projects/rubinius/code$ uname -a
Linux neuros 2.6.22-14-generic #1 SMP Sun Oct 14 21:45:15 GMT 2007 x86_64 GNU/Linux
tlockney@neuros:~/Projects/rubinius/code$ shotgun/rubinius -v
An error has occured: Segmentation fault (SIGSEGV) (11)
Ruby backtrace:
0x2affbf8780b0 Class#__class_init__+13 in kernel/core/float.rb:5
0x2affbf878010 Class#__script__+17 in kernel/core/float.rb:3
VM Registers:
IP: 0122
SP: 0163
Exception: none
-

Amr Malik December 8th, 2007 @ 10:16 PM
Getting the same error on my box from tonight's checkout:
Linux desktop 2.6.22-14-generic #1 SMP Sun Oct 14 21:45:15 GMT 2007 x86_64 GNU/Linux
-

David Whittington December 10th, 2007 @ 04:39 PM
Testing on b9b6c5005cd507113d1894917242b6b4ad499463 I'm seeing the same issue mentioned above on my Slicehost VPS (Xen running on AMD64):
$ uname -a
Linux marvin 2.6.16.29-xen #1 SMP Sun Sep 30 04:00:13 UTC 2007 x86_64 Dual-Core AMD Opteron(tm) Processor 2212 HE AuthenticAMD GNU/Linux
Build finishes ok, but when I to run shotgun I see the following:
$ shotgun/rubinius -v
An error has occured: Segmentation fault (SIGSEGV) (11)
Ruby backtrace:
0x2ac80b9ca0b0 Class#__class_init__+13 in kernel/core/float.rb:5
0x2ac80b9ca010 Class#__script__+17 in kernel/core/float.rb:3
VM Registers:
IP: 0152
SP: 0163
Exception: none
-
Eero Saynatkari December 10th, 2007 @ 07:01 PM
- → State changed from new to open
- → Title changed from Potential amd64/x86-64 Problems--Report Here! to amd64/x86-64 Problems--Report Here!
Few follow-ups:
- Seems Linux amd64-specific. I use FreeBSD so I will need to do a bit of pair debugging with someone.
- Has anyone had it work in the last month or so and it has only now stopped working?
-
Is your `uname -m`
x86_64and `uname -p`unknown? -
In shotgun/common.mk, disable the
MARCH=amd64force and see if it makes a difference. -
`shotgun/rubinius --gdb`, r to run and get the
rbtandbt. This is probably an FFI issue. -
It will definitely be best if we can try to debug this on IRC (I am usually around before 3pm and after 12am EST), but if someone feels adventurous, gdb can be used to try to at least locate the error. If, as I suspect, it is an FFI issue, you will need to first get to the
ffi_call()call that fails (if it is not an FFI issue, then do some other debugging instead:). If you are breaking right there, you can go up one frame andrbs_symbol_to_cstring(state, method_name)to find out which method is being run (and then back down a frame toffi_call()). In @ffi_call()@, @n@ until thefunc()call, at which point you want todisplay /i $pcto see the asm and thensiinto and through the generated function. The cases I have seen have failed at line 144 of shotgun/lib/subtend/ffi_amd64.c, presumably because the pointer being manipulated is corrupt. We want to find out what this pointer is (it SHOULD be the pointer to the C function being run.)
-
Eero Saynatkari December 10th, 2007 @ 07:03 PM
- → Title changed from amd64/x86-64 Problems--Report Here! to Linux x86_64 (amd64) SigSegV (float.rb)
Stupid formatting above. Anyway, topic change.
-
Eero Saynatkari December 10th, 2007 @ 10:27 PM
New info, at least one candidate's system seemed to break in
cpu_clear_cache_for_method()which has nothing at all to do with FFI :PSo,
DEV=1 rake buildthenshotgun/rubinius --gdbthen @r@ and thenbtand tell me what you get. -

Thomas Lockney December 10th, 2007 @ 11:13 PM
Rebuilt and ran using the instructions you gave. Here's the backtrace:
#0 0x0000000000677ff8 in ?? ()
#1 0x0000000000000f95 in ?? ()
#2 0x00002b8beeebe2b0 in ?? ()
#3 0x00002b8bef2654e8 in ?? ()
#4 0x000000000063c458 in ?? ()
#5 0x0000000000001f2b in ?? ()
#6 0x00002b8beeebda78 in ?? ()
#7 0x00002b8beeebda78 in ?? ()
#8 0x000000000063c458 in ?? ()
#9 0x000000000000000e in ?? ()
#10 0x00002b8bef282e40 in ?? ()
#11 0x00007fffbd1b19c0 in ?? ()
#12 0x00002b8bedb8bb09 in object_kind_of_p (state=0x63c458, self=0x2b8beeebda78, cls=0x2b8beeebda78) at object.c:50
#13 0x00002b8bedb9c8d6 in ffi_call (state=0x6030e0, c=0x63bdb0, ptr=0x2b8beefb9840) at subtend/ffi.c:951
#14 0x00002b8bedb66453 in cpu_perform_system_primitive (state=0x6030e0, c=0x63bdb0, prim=163, mo=0x657c68, num_args=0, method_name=0x14eb, mod=0x2b8beefb64b0)
at system_primitives.gen:1898
#15 0x00002b8bedb4d84b in cpu_perform_primitive (state=0x6030e0, c=0x63bdb0, prim=163, mo=0x657c68, args=0, name=0x14eb, mod=0x2b8beefb64b0) at cpu.h:285
#16 0x00002b8bedb4d779 in cpu_try_primitive (state=0x6030e0, c=0x63bdb0, mo=0x657c68, recv=0x6579a8, args=0, sym=0x14eb, mod=0x2b8beefb64b0) at cpu_instructions.c:489
#17 0x00002b8bedb4e841 in _cpu_build_and_activate (state=0x6030e0, c=0x63bdb0, mo=0x657c68, recv=0x6579a8, sym=0x14eb, args=0, block=0xe, missing=0, mod=0x2b8beefb64b0)
at cpu_instructions.c:838
#18 0x00002b8bedb4e5c3 in cpu_unified_send (state=0x6030e0, c=0x63bdb0, recv=0x6579a8, sym=0x14eb, args=0, block=0xe) at cpu_instructions.c:986
#19 0x00002b8bedb523d5 in cpu_run (state=0x6030e0, ic=0x63bdb0, setup=0) at instructions.gen:481
#20 0x00002b8bedb866c9 in machine_run (m=0x603010) at machine.c:493
#21 0x00002b8bedb867e8 in machine_run_file (m=0x603010, path=0x642160 "runtime/core/kernel/core/float.rbc") at machine.c:518
#22 0x00002b8bedb8760f in machine_load_directory (m=0x603010, prefix=0x601680 "runtime/core") at machine.c:838
#23 0x00002b8bedb8796e in machine_load_bundle (m=0x603010, path=0x601680 "runtime/core") at machine.c:919
#24 0x0000000000400e8f in main (argc=1, argv=0x7fffbd1b36e8) at main.c:100
-

David Whittington December 11th, 2007 @ 07:24 PM
Same results for me:
#0 0x0000000000678ff8 in ?? ()
#1 0x0000000000000ed5 in ?? ()
#2 0x00002b0f90a552b0 in ?? ()
#3 0x00002b0f90dfc7c0 in ?? ()
#4 0x000000000063d458 in ?? ()
#5 0x0000000000001dab in ?? ()
#6 0x00002b0f90a54a78 in ?? ()
#7 0x0000000200000050 in ?? ()
#8 0x00002b0f90a54a78 in ?? ()
#9 0x000000010000000e in ?? ()
#10 0x00002b0f90e18db8 in ?? ()
#11 0x00007fffffb17a50 in ?? ()
#12 0x00002b0f8f85329d in object_kind_of_p (state=0x2b0f90a54a78, self=0x200000050, cls=0x2b0f90a54a78) at object.c:50
#13 0x00002b0f8f864224 in ffi_call (state=0x6040e0, c=0x63cdb0, ptr=0x2b0f90b4bc78) at subtend/ffi.c:951
#14 0x00002b0f8f82e65b in cpu_perform_system_primitive (state=0x6040e0, c=0x63cdb0, prim=163, mo=0x658b48, num_args=0, method_name=0x1493, mod=0x2b0f90b48478)
at system_primitives.gen:1898
#15 0x00002b0f8f8159f0 in cpu_perform_primitive (state=0x6040e0, c=0x63cdb0, prim=163, mo=0x658b48, args=0, name=0x1493, mod=0x2b0f90b48478) at cpu.h:285
#16 0x00002b0f8f81591e in cpu_try_primitive (state=0x6040e0, c=0x63cdb0, mo=0x658b48, recv=0x652718, args=0, sym=0x1493, mod=0x2b0f90b48478)
at cpu_instructions.c:489
#17 0x00002b0f8f816a14 in _cpu_build_and_activate (state=0x6040e0, c=0x63cdb0, mo=0x658b48, recv=0x652718, sym=0x1493, args=0, block=0xe, missing=0,
mod=0x2b0f90b48478) at cpu_instructions.c:838
#18 0x00002b0f8f81678e in cpu_unified_send (state=0x6040e0, c=0x63cdb0, recv=0x652718, sym=0x1493, args=0, block=0xe) at cpu_instructions.c:986
#19 0x00002b0f8f81a654 in cpu_run (state=0x6040e0, ic=0x63cdb0, setup=0) at instructions.gen:481
#20 0x00002b0f8f84df17 in machine_run (m=0x604010) at machine.c:493
#21 0x00002b0f8f84e036 in machine_run_file (m=0x604010, path=0x643170 "runtime/core/kernel/core/float.rbc") at machine.c:518
#22 0x00002b0f8f84ed7b in machine_load_directory (m=0x604010, prefix=0x602100 "runtime/core") at machine.c:838
#23 0x00002b0f8f84f0d9 in machine_load_bundle (m=0x604010, path=0x602100 "runtime/core") at machine.c:919
#24 0x0000000000400efe in main (argc=1, argv=0x7fffffb198b8) at main.c:100
-

Amr Malik December 12th, 2007 @ 06:00 AM
here is my backtrace (MRI ruby version is 1.8.6 fwiw)
(gdb) r
...
[Thread debugging using libthread_db enabled]
[New Thread 47146559498992 (LWP 17314)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 47146559498992 (LWP 17314)]
0x0000000000677ff8 in ?? ()
(gdb) bt
#0 0x0000000000677ff8 in ?? ()
#1 0x0000000000001079 in ?? ()
#2 0x00002ae129eeb2b0 in ?? ()
#3 0x00002ae12a22c510 in ?? ()
#4 0x000000000063c458 in ?? ()
#5 0x00000000000020f3 in ?? ()
#6 0x00002ae129eeaa78 in ?? ()
#7 0x00002ae129eeaa78 in ?? ()
#8 0x000000000063c458 in ?? ()
#9 0x000000000000000e in ?? ()
#10 0x00002ae12a24b5b0 in ?? ()
#11 0x00007fff8217ca20 in ?? ()
#12 0x00002ae128bbe89d in object_kind_of_p (state=0x63c458,
self=0x2ae129eeaa78, cls=0x2ae129eeaa78) at object.c:50
#13 0x00002ae128bcf66a in ffi_call (state=0x6030e0, c=0x63bdb0,
ptr=0x2ae129fe4348) at subtend/ffi.c:951
#14 0x00002ae128b992f9 in cpu_perform_system_primitive (state=0x6030e0,
c=0x63bdb0, prim=163, mo=0x657c48, num_args=0, method_name=0x14c3,
mod=0x2ae129fe2100) at system_primitives.gen:1904
#15 0x00002ae128b8047b in cpu_perform_primitive (state=0x6030e0, c=0x63bdb0,
prim=163, mo=0x657c48, args=0, name=0x14c3, mod=0x2ae129fe2100)
at cpu.h:285
#16 0x00002ae128b803a9 in cpu_try_primitive (state=0x6030e0, c=0x63bdb0,
---Type to continue, or q to quit---
mo=0x657c48, recv=0x6578c8, args=0, sym=0x14c3, mod=0x2ae129fe2100)
at cpu_instructions.c:489
#17 0x00002ae128b81471 in _cpu_build_and_activate (state=0x6030e0, c=0x63bdb0,
mo=0x657c48, recv=0x6578c8, sym=0x14c3, args=0, block=0xe, missing=0,
mod=0x2ae129fe2100) at cpu_instructions.c:838
#18 0x00002ae128b811f3 in cpu_unified_send (state=0x6030e0, c=0x63bdb0,
recv=0x6578c8, sym=0x14c3, args=0, block=0xe) at cpu_instructions.c:986
#19 0x00002ae128b85005 in cpu_run (state=0x6030e0, ic=0x63bdb0, setup=0)
at instructions.gen:481
#20 0x00002ae128bb956d in machine_run (m=0x603010) at machine.c:493
#21 0x00002ae128bb968c in machine_run_file (m=0x603010,
path=0x642170 "runtime/core/kernel/core/float.rbc") at machine.c:518
#22 0x00002ae128bba4b3 in machine_load_directory (m=0x603010,
prefix=0x601680 "runtime/core") at machine.c:838
#23 0x00002ae128bba812 in machine_load_bundle (m=0x603010,
path=0x601680 "runtime/core") at machine.c:919
#24 0x0000000000400e8f in main (argc=1, argv=0x7fff8217e758) at main.c:100
(gdb)
-
Eero Saynatkari December 21st, 2007 @ 08:16 PM
I have not been able to find much more information. The problem does looks such that it has not gotten magically fixed for anyone in HEAD, has it?
As it appears that I will not have access to a Linux amd64 anytime soon, I will probably need someone with some experience with GDB and a couple spare hours to work with.
One more question--everyone clearly has an amd64 processor but is your OS running in 32-bit or 64-bit mode?
-

David Whittington December 24th, 2007 @ 12:30 PM
Actually, seems like it has been magically fixed on HEAD. I no longer get the segfault, and I'm running the specs right now.
-

Thomas Lockney December 24th, 2007 @ 05:51 PM
Yup, same result as David here. Seems to be working fine now.
-
Eero Saynatkari December 24th, 2007 @ 11:41 PM
If you guys get the chance, please do
git-bisectto see which commit fixes the issue (it is a pretty nifty command if you read the man page, basically a binary fault search helper.)I will leave open for a bit to see if anyone else is having issues still.
-
Michael Neumann December 27th, 2007 @ 10:31 AM
Hm, I still get a similar problem on FreeBSD.
uname -a
FreeBSD nunus 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Mon Dec 3 16:01:08 CET 2007 mneumann@nunus:/usr/src/sys/amd64/compile/NUNUS amd64
I do a DEV=1 rake build using the HEAD revision and get the following:
CC rubinius.bin
gmake[1]: Leaving directory `/usr/home/mneumann/Dev/rubinius/shotgun'
Generating runtime/platform.conf...
An error has occured: Segmentation fault (SIGSEGV) (11)
Ruby backtrace:
0x80170156c Class#new+0 in kernel/core/regexp.rb:0
0x80170102c Class#join+92 in kernel/core/file.rb:354
0x801700ea4 #+311 in kernel/loader.rb:42
0x801700de8 Array#each+44 in kernel/core/array.rb:548
0x8016d10d8 Class#__script__+343 in kernel/loader.rb:40
VM Registers:
IP: 0213
SP: 0005
Exception: none
rake aborted!
Command failed with status (254): [shotgun/rubinius -Iruntime/stable/compiler...]
-

Amr Malik December 27th, 2007 @ 12:27 PM
Fixed for me as well (amd64 linux). I'd be interested in knowing how git-bisect fishes out a particular error.
-
Eero Saynatkari December 28th, 2007 @ 11:05 AM
Hm, platform.conf is used to generate the C struct offsets. You may need to monkey with those by hand.
git-bisectbinary search basically operates so that you select a known "good" version A (where X worked) and a known "bad" version B (where X is broken.) Then, bisect automatically jumps to the halfway point T1 between the two, you try if it works or not and based on the result, bisect jumps to the halfpoint T2 between A and T1 or B and T1 and so on until only one possible revision remains. It also allows you to use a script that is run at each halfpoint so that you can automate the entire process. -
Michael Neumann December 28th, 2007 @ 11:43 AM
It generates platform.conf successfully. So the error is not in this task. It segfaults when it tries to precompile the first file:
export RBX_PLATFORM="runtime/stable/platform.rba"
export RBX_LOADER="runtime/stable/loader.rbc"
export RBX_CORE="runtime/stable/core.rba"
export RBX_BOOTSTRAP="runtime/stable/bootstrap.rba"
shotgun/rubinius -Iruntime/stable/compiler1.rba compile kernel/core/dir.rb
runtime/core/kernel/core/dir.rbc
Leads to:
An error has occured: Segmentation fault (SIGSEGV) (11)
Ruby backtrace:
0x80170156c Class#new+0 in kernel/core/regexp.rb:0
0x80170102c Class#join+92 in kernel/core/file.rb:354
0x801700ea4 #+311 in kernel/loader.rb:42
0x801700de8 Array#each+44 in kernel/core/array.rb:548
0x8016d10d8 Class#__script__+343 in kernel/loader.rb:40
VM Registers:
IP: 0213
SP: 0005
Exception: none
-
Eero Saynatkari January 3rd, 2008 @ 08:06 AM
Michael,
If you are still getting the error on HEAD after a clean checkout or doing:
rake rbc:clean rake pristine DEV=1 rake rebuildLet me know. We need to try to do a debugging session over the weekend or something.
-
Michael Neumann January 3rd, 2008 @ 09:23 AM
I'm still getting the error below using a clean checkout. I'm running FreeBSD, maybe that matters?
An error has occured: Segmentation fault (SIGSEGV) (11)
Ruby backtrace:
0x8017014dc Class#new+95 in kernel/core/regexp.rb:41
0x801700fbc Class#join+41 in kernel/core/file.rb:356
0x801700e3c #+35 in kernel/loader.rb:47
0x801700d80 Array#each+24 in kernel/core/array.rb:556
0x801700234 Class#__script__+163 in kernel/loader.rb:45
VM Registers:
IP: 0180
SP: 0005
Exception: none
rake aborted!
-
Eero Saynatkari January 3rd, 2008 @ 10:48 AM
FreeBSD amd64 6.2R here, works OK. I wonder what CURRENT changes.
-
Eero Saynatkari January 5th, 2008 @ 09:30 PM
- → State changed from open to resolved
Looks like Linux is fine so I am closing this ticket. I opened #204 for the FreeBSD-CURRENT problem.
Please Login or create a free account to add a new comment.
You can update this ticket by sending an email to from your email client. (help)
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile »
