Opened 8 months ago

Closed 8 months ago

#15015 closed defect (fixed)

COMMON: AGS games lead to a reproducable crash on exiting ScummVM

Reported by: raziel- Owned by: PushmePullyu
Priority: normal Component: Engine: AGS
Version: Keywords:
Cc: Game:

Description

ScummVM 2.9.0git (Mar 11 2024 21:02:00)
Using SDL backend with SDL 2.30.1
Features compiled in: Vorbis MP3 RGB zLib MPEG2 MikMod Theora VPX AAC A/52 FreeType2 FriBiDi JPEG PNG GIF cloud (servers, local) ENet SDL2 TinyGL OpenGL (with shaders)

Loading any AGS game (tried with "A Golden Wake" and "Gemini Rue"), while only loading into the options/start menu of the game, then RTL and quitting ScummVM, will lead to a certain crash.

Crash and serial logs attached

Any AGS game

AmigaOS4 - PPC - BE
gcc (adtools build 11.3.0) 11.3.0

Attachments (5)

AGS crash.txt (24.0 KB ) - added by raziel- 8 months ago.
Crashlog_scummvm_2024-03-11_18-17-02.txt (41.9 KB ) - added by raziel- 8 months ago.
Grim Reaper log 1
AGS serial.txt (15.8 KB ) - added by raziel- 8 months ago.
serial log 2
AGS Grim Reaper.txt (26.7 KB ) - added by raziel- 8 months ago.
Grim Reaper log 2
Crashlog_scummvm_2024-03-14_16-22-14.txt (44.1 KB ) - added by raziel- 8 months ago.

Download all attachments as: .zip

Change History (28)

by raziel-, 8 months ago

Attachment: AGS crash.txt added

by raziel-, 8 months ago

Grim Reaper log 1

by raziel-, 8 months ago

Attachment: AGS serial.txt added

serial log 2

by raziel-, 8 months ago

Attachment: AGS Grim Reaper.txt added

Grim Reaper log 2

comment:1 by PushmePullyu, 8 months ago

I did some digging and now have a suspicion of what could be causing the crash.

Could you test with this branch?
https://github.com/PushmePullyu/scummvm/tree/amigaos-agi-crash
(Ignore the typo in the branch name ;-)

Edit: Forgot to include the revert commit, force pushed an update.

Note: This also includes the atexit fixup from https://github.com/scummvm/scummvm/pull/5696

Last edited 8 months ago by PushmePullyu (previous) (diff)

comment:2 by raziel-, 8 months ago

I built ScummVM with one engine (AGS) to test and it threw up this very strange error on start

ELF library
Unable to resolve symbol
''
in Development:Porting/ScummVM/scummvm-amigaos-ags-crash/install/plugins/ags.plugin

followed by

ELF library
Failed to resolve symbol at runtime.
Process 'Shell Process' bas been suspended.

followed by a crash, see log

After that i built the AGI engine with the exact same branch and it worked

comment:3 by PushmePullyu, 8 months ago

Weird stuff. Did you try a full rebuild (e.g. make --always-make)?

comment:4 by raziel-, 8 months ago

i *think* it has to do with a libsdl update...wasted a whole day already and slowly getting there, new builds always crash like that...trying to get back a revision and go from there

comment:5 by raziel-, 8 months ago

@PushmePullyu

Would it be possible to split the two branches?

one with the AGS exit fix and one with the unfreed signals?

something smelly hit the fan really hard over here and right now im trying to figure out why only this branch of yours is driving me mad...

could those changes affect plugin behaviour?

thank you

comment:6 by PushmePullyu, 8 months ago

@raziel-

atexit fixup only:
https://github.com/PushmePullyu/scummvm/tree/amigaos-atexit-fixup-only

ags fix only, without any atexit cleanup (i.e. this includes your revert commit):
https://github.com/PushmePullyu/scummvm/tree/ags-fix-without-atexit-cleanup

ags fix with original atexit cleanup:
https://github.com/PushmePullyu/scummvm/tree/ags-fix-with-orig-atexit

The only thing the AGS fix does is to replace a local static with a field, which removes the need to run its dtor on program exit / DSO unload.

My branches are based on 8366119da3ff867621b9164575c9df36225eabcd "AMIGAOS: clean up compiler flags", so if you were testing with a previous commit, maybe the problems are caused by some other recent changes to master.

Might be
06a87ccaf07716f3498eaa988bfe1fb83381bcb1 "AGS: Add MAD to Engine Dependencies".

Did you try a commit before that?
This branch has the atexit fixup and the AGS fix, but rebased onto the parent commit of the MAD change: https://github.com/PushmePullyu/scummvm/commits/amigaos-agi-crash-rebase-before-mad

comment:7 by raziel-, 8 months ago

Thank you for all the test branches, i'll get to it

yes, i was only testing branches earlier to your branch, but after the compiler flag changes, but those are not the culprit, i reverted those locally and it still broke.

i'll test with a fresh master branch and yours (without mad) and report back

Thanks a lot

comment:8 by raziel-, 8 months ago

@PushmePullyu

First of all...great news...and great work :-D

The "amigaos-agi-crash-rebase-before-mad" branch works and the AGS exit crash is gone as well :-)

Need to test a current master build to see if it really was the MAD implementation.

Your other branches i could not (yet) download, will do eventually

Thanks a ton

comment:9 by raziel-, 8 months ago

Yep, current master is bugged

I'll bisect tomorrow...but what would i do if it really is the MAD addition?
I do have libmad.so in sobjs/

comment:10 by raziel-, 8 months ago

@PushmePullyu

Your assumption was spot on :-)

https://github.com/scummvm/scummvm/commit/4ea65b1a963fb6988117cf7b7ed94a03c6d7d101
is fine (before AGS: Add MAD to engine dependancies)

https://github.com/scummvm/scummvm/commit/66bc819014bb675b685eabec60781442a7b2a660
is bugged (after AGS: Add MAD to engine dependancies)

Should i open a new tracker item?

NB:
Funny though https://github.com/scummvm/scummvm/commit/06a87ccaf07716f3498eaa988bfe1fb83381bcb1 which introduced the change works and starts fine...it's the one commit AFTER that crashes :-/

Last edited 8 months ago by raziel- (previous) (diff)

comment:11 by raziel-, 8 months ago

@PushmePullyu

Please go ahead and PR the unfreed and AGS exit fix...maybe it can be backported, but i guess not, master is already frozen

...and disregard the last three posts regarding libmad and/or crashes...the reason lies somewhere else and definitely stems from my compiler flag changes/PR.

Investigating...but i think it might be the shared detection that causes it...maybe AmigaOS cannot work with accessing two different plugins at once (detection and ags.plugin in this case)?

Thank you

comment:12 by PushmePullyu, 8 months ago

For reference: PR is at https://github.com/scummvm/scummvm/pull/5713

Since the removal of the static fixed the AGS crash, there seems to be some problem with DSO unloading and dtors of statics.

Dtors registered from a DSO _seem_ to run on program exit instead of SO unload (or maybe they are run twice). Perhaps your libc / build is not using __cxa_atexit?

From the gcc manpage:

-fuse-cxa-atexit
Register destructors for objects with static storage duration with the "__cxa_atexit" function rather than the "atexit" function. This option is required for fully standards-compliant handling of static destructors, but only works if your C library supports "__cxa_atexit".

comment:13 by raziel-, 8 months ago

should -fuse-cxa-exit be default or do I need to add it to the compiler flags?

but it wouldnt do anything if the c lib doesn't support it...

comment:14 by PushmePullyu, 8 months ago

I have pushed some code that tests if statics are destroyed on DSO unload here:
https://github.com/PushmePullyu/sotest/tree/main

You will have to adjust the compiler flags in build-linux.sh to your system (and maybe change the SONAME define in sotest.cpp).

Output on my system:
./sotest

sotest.cpp:35: int main(): Registering cleanup function
sotest.cpp:43: int main(): Loading SO
sotest.cpp:48: int main(): Getting function pointer from SO
sotest.cpp:53: int main(): Calling SO function
libfoo.cpp:11: void someSOFunc(): This is someSOFunc()
libfoo.cpp:6: void someSOFuncWithStatic(): This is someSOFuncWithStatic()
testclass.cpp:9: TestClass::TestClass(const char*): TestClass ctor: context: libfoo, this=0x7f5647ca4080
sotest.cpp:56: int main(): Calling local function
sotest.cpp:27: void localFuncWithStatic(): This is localFuncWithStatic()
testclass.cpp:9: TestClass::TestClass(const char*): TestClass ctor: context: sotest, this=0x4040c0
sotest.cpp:59: int main(): Unloading SO
testclass.cpp:13: TestClass::~TestClass(): TestClass dtor: context: libfoo, this=0x7f5647ca4080
sotest.cpp:62: int main(): Calling SDL_Quit
sotest.cpp:65: int main(): Returning from main
testclass.cpp:13: TestClass::~TestClass(): TestClass dtor: context: sotest, this=0x4040c0
sotest.cpp:23: void cleanup(): This is cleanup()

To see whether gcc was built with cxa_atexit, you can use -v:

g++ -v 2>&1 | grep -io '[^[:space:]]*cxa_atexit'

Output:

--enable-__cxa_atexit

If it was, -fuse-cxa-atexit should be passed by default.

Or:

g++ -Q --help=c++ | grep -iF atexit

Output:

  -fuse-cxa-atexit            		[available in C++, ObjC++]

You could also check your C lib / scummvm executable for the cxa_atexit symbol.
E.g. on a Linux system using glibc:

nm -D /lib64/libc.so.6 | grep -iF cxa_atexit

Output:

000000000003fb70 T __cxa_atexit@@GLIBC_2.2.5

comment:15 by raziel-, 8 months ago

with just a little addition the script ran perfectly :-)

unfortunately, not the same output

/Development/Porting/sotest-main> ./sotest
sotest.cpp:35: int main(): Registering cleanup function
sotest.cpp:43: int main(): Loading SO
sotest.cpp:18: void fail(): error: DLOpen failed for './libfoo.so'
sotest.cpp:23: void cleanup(): This is cleanup()

---

++ -v 2>&1 | grep -io '[^[:space:]]*cxa_atexit'

that doesn't work here, but i get this with a different method

Binary file SDK:gcc/bin/g++ matches

so i guess it's inside?

---

g++ -Q --help=c++ | grep -iF atexit

Output:

-fuse-cxa-atexit [available in C++, ObjC++]

---

nm -D /lib64/libc.so.6 | grep -iF cxa_atexit

that doesn't work either, so i guess not?

I can find a "atexit" when looking with a hex editor, but nothing "cxa"

comment:16 by PushmePullyu, 8 months ago

sotest.cpp:18: void fail(): error: DLOpen failed for './libfoo.so'

Does libfoo.so exist after building? If yes, you probably need to change sotest.cpp:8

#define SONAME "./libfoo.so"

to an AmigaOS path (can be absolute or relative).

comment:17 by raziel-, 8 months ago

i missed the path remark, sorry, went a little further :-)

sotest.cpp:35: int main(): Registering cleanup function
sotest.cpp:43: int main(): Loading SO
sotest.cpp:48: int main(): Getting function pointer from SO
sotest.cpp:53: int main(): Calling SO function
libfoo.cpp:11: void someSOFunc(): This is someSOFunc()
libfoo.cpp:6: void someSOFuncWithStatic(): This is someSOFuncWithStatic()
testclass.cpp:9: TestClass::TestClass(const char*): TestClass ctor: context: libfoo, this=0x5c65d714
sotest.cpp:56: int main(): Calling local function
sotest.cpp:27: void localFuncWithStatic(): This is localFuncWithStatic()
testclass.cpp:9: TestClass::TestClass(const char*): TestClass ctor: context: sotest, this=0x5c660870
sotest.cpp:59: int main(): Unloading SO
sotest.cpp:62: int main(): Calling SDL_Quit
sotest.cpp:65: int main(): Returning from main
testclass.cpp:13: TestClass::~TestClass(): TestClass dtor: context: sotest, this=0x5c660870

after that line it crashes, no cleanup()

comment:18 by raziel-, 8 months ago

the library TestClass is missing it's dtor?

comment:19 by PushmePullyu, 8 months ago

Yes, the output confirms that static dtors are not run on DSO unload, but at program exit. Since the SO is already unloaded, this causes a crash.

Could you test if adding -fuse-cxa-atexit to both g++ invocations changes anything?

comment:20 by raziel-, 8 months ago

sure, linker error though

/T/ccfTyWod.o:(.got2+0x20): undefined reference to `__dso_handle'
Development:Coding/SDK/gcc/ppc-amigaos/bin/ld: libfoo.so: hidden symbol `__dso_handle' isn't defined
Development:Coding/SDK/gcc/ppc-amigaos/bin/ld: final link failed: Bad value

---

#!/bin/sh

g++ -Wall  -fuse-cxa-atexit -fPIC -shared -o libfoo.so libfoo.cpp testclass.cpp -fuse-cxa-atexit && \
g++ -Wall  -fuse-cxa-atexit -o sotest sotest.cpp testclass.cpp $(sdl2-config --cflags) $(sdl2-config --libs) -athread=native

comment:21 by PushmePullyu, 8 months ago

Alright, thanks! Looks like -fuse-cxa-atexit only works for glibc currently.

So, on your platform, unloading a DSO that registers destructors/functions to be run on exit will always crash when the main program exits.

The good news is that the ScummVM coding conventions forbid constructs which require such registration (see https://wiki.scummvm.org/index.php?title=Coding_Conventions#Use_of_global_.2F_static_variables).

They still creep into the code sometimes, so should you get similar on-exit crashes in the future, this might be the cause.

In case you want to open an issue/feature request for this somewhere (adtools maybe), there is a simplified version of the repro code at https://github.com/PushmePullyu/sotest/tree/repro-dlclose-dtors, which removes the SDL dependency. Not sure if it builds on AmigaOS without modifications. Note: The build script needs a Bourne-compatible shell.

I suppose this ticket can be closed then?

P.S. Sorry for the delay, has been a super busy week.

comment:22 by raziel-, 8 months ago

no problem

thanks a lot for helping

yes, this can be closed

comment:23 by tag2015, 8 months ago

Owner: set to PushmePullyu
Resolution: fixed
Status: newclosed

Thanks a lot for your work! Closing then :)

Note: See TracTickets for help on using tickets.