Opened 18 years ago

Closed 18 years ago

Last modified 6 years ago

#8590 closed patch

SH and MIPS need memalign too

Reported by: SF/mlau2 Owned by: fingolfin
Priority: normal Component: Ports
Version: Keywords:
Cc: Game:

Description

Here's a simple patch to force memalign for SH and MIPS target processors. It's not stricly needed for both since a) SH kernel will emulate the unaligned read (slooooow) b) MIPS does not complain at all but aligned reads/writes are faster.

Run-tested on SH-4 and MIPSEL targets running linux and QNX.

Ticket imported from: #1596959. Ticket imported from: patches/695.

Attachments (1)

scummvm-align-required.diff (449 bytes ) - added by SF/mlau2 18 years ago.
patch to force memlaign for superh and mips targets

Download all attachments as: .zip

Change History (11)

by SF/mlau2, 18 years ago

Attachment: scummvm-align-required.diff added

patch to force memlaign for superh and mips targets

comment:1 by fingolfin, 18 years ago

Owner: set to fingolfin

comment:2 by fingolfin, 18 years ago

Sounds sensible. However, somebody just looking at the patch / the modified configure script, won't be able to read as much sense into those few lines. Hence, there really should be comments going in there along with your changes that explain them. I.e. add a comment to each case summing up more or less what you stated here. After that, I see nothing stopping this from being added :-)

comment:3 by SF/mlau2, 18 years ago

Thinking about this some more, is there a specific reason NOT to enable alignment? Almost every (newer) RISC design takes a performance hit /raises exceptions on unaligned accesses. (I think it even makes modern x86 cpus happier)

comment:4 by fingolfin, 18 years ago

Nope, not true. And in fact, thinking about it properly again, I am not sure anymore whether you patch does make sense after all... :-).

The macros we use to access data in an endian independant fashion can read 32 bit either bytewise, or in one go. The thing is, in the end the processor has to read the same data anyway, only once with a single instruction, and once with four. On modern desktop CPUs (x86, PowerPC), both will cause a single access to RAM (unless you are unlucky and cross a cache line boundary). But in one case, 4 read instructions have to be decoded, in the other only 1. We do lots of memory access through those macros. And most are aligned. Only a few sometimes aren't.

Usually the only real reason to disable unaligned read/write is if it causes your CPU to crash / return invalid data. Any speed hit should be negligible. Did you perform any actual benchmarking to see whether any noticable speed up occure, anywhere?

comment:5 by SF/mlau2, 18 years ago

At least the SH-4 raises an exception when a 16/32 bit load/store to unaligned addresses occures. I don't see any perf improvement in the aligned case; it just silences a kernel warning. (I though that there might be something wrong in the scummvm code; no other userspace app triggers this.)

Somewhere in the code the 32bit accessor macros must be doing accesses where "((unsigned long)address & 3)!=0" and that really upsets the SH-4 (and while other archs may not complain about it like the SH, they DO like *aligned* 16/32/64 reads/writes more :) )

Again, is there a specific need for such unaligned accesses? (datafiles layout? my test-game is monkey1 dos version)

comment:6 by fingolfin, 18 years ago

Yes, we do need to make unaligned access in some places, precsisely because of the data layout in the game data files. Those often were written on systems with tight memory constraints, so if you could get along with using a byte, you used a byte. Hence you get lots of data starting at odd memory addresses.

We work around that by using wrapper macros when accessing data blocks (and various other techniques). Those macros can be set to read single bytes and "glue" them together to 2/4 byte words, as required (they also deal with endian conversion on the fly). The thing is, doing that is on most systems not faster, or even slower, than just telling the CPU to grab that same word directly from an odd address.

All in all, it would probably be possible to just *always* read the data byte wise, as the speed impact should be negligible on most systems... "Should", though, means that I have no numbers on this and can't be sure there won't be regressions.

comment:7 by SF/mlau2, 18 years ago

Thanks for the explanation. Theoretically it should be faster (at least on the SH) to let the compiler to the byte-fiddling than trapping into the kernel. (Takes 12 insns to do a 32bit bytewise unaligned read. The sh4 has 2 pipes; with clever assembler code this would at most take 8 cycles to complete. Trapping into the kernel takes *MUCH* more cycles)

I don't care either way if you apply the patch or not, since in both cases scummvm works quite fine.

Thanks again for your time!

comment:8 by fingolfin, 18 years ago

Well I am adding this for now, it should at least not hurt. If somebody complaints we can reconsider it.

comment:9 by fingolfin, 18 years ago

Status: newclosed

comment:10 by digitall, 6 years ago

Component: Ports
Note: See TracTickets for help on using tickets.