-
Notifications
You must be signed in to change notification settings - Fork 19
Performance Ideas
This page will serve as a holding pen for half-baked performance ideas. Once they become more real, they will be migrated into issues we can track against the code.
-
Evaluate implementation of StringHelper::GenerateCopyCharactersLong relative to Intel/ARM versions
-
Harvest tricks from PowerPC Compiler Writer's Guide
-
Consider peep-hole optimization of multiple double access sequences
li r0, offset-1
stdx value, r0, base_reg
-
Exploit mmap of object space to limit 64bit addresses to 32bits
-
Reserve scratch area in context? for double to integer backing store (instead of the stack) Experiment using space in the root showed a 26% improvement for the cordic benchmark; however, octane showed no measureable difference (perhaps 1%, but not enough repeatability to verify that).
-
Modify double precision logic to take advantage of 64-bit registers. Much of the current logic manipulates high and low parts in two registers (i.e. it assumes 32-bit).
-
Avoid two-instruction 64-bit LoadP/StoreP sequences caused by tagged heap object pointers in cases where multiple memory accesses are performed. Modify such sequences to leverage an untagged pointer placed in a temporary register.
-
Floating-point conversion creates a load-hit-store/read-after-write scenario, which creates a stall of a sort. This is improved when a nop is inserted. Inserting a nop between the store and the lfd yielded a 7% performance improvement for cordic, none was observable for octane.
-
Multiple references to an object address should instead assign the address to a register and use the register. This should be a high level optimization. Today constants are resynthesized on each reference, or with the constant pool are reloaded.
-
Loop Rotation - replace unconditional branch at loop bottom with the loop test/branch, and duplicate the test prior to entry. Static analysis should eliminate the extra check prior to the loop when constant conditions are present. Unfortunately, the loop can't necessarily just fall through at the bottom and an unconditional branch would be required there to jump to the next block of code. Oddly enough, the full compiler case does loop rotation, but lithium doesn't. This should be a high level optimization. The code in hydrogen-gvn.c would be at play here, and I considered what would be required. There are some ambiguous references to garbage collection interplay with branches, and lithium seems to have some hard coded knowledge of loops that matches hydrogen's generation.
-
Branch straightening - code path should be such that fall-through blocks don't require an extra branch to reach. Today, the code sometimes has the overflow conditions in the fall-through path. This seems to occur because code generation is done in a choppy manner, where the code generating the branch only knows about the overflow case, and generates that and then branches to the main path. A post pass, should code generation be buffered so such a pass could occur, could fix this, but a flow graph would be necessary. With V8's design, I'm uncertain if there is a way this could be performed at the high level.