Optimization of the Day

Consider this (unoptimized) code:

.412
    ...
    sta %t0
    lda %t2
    adc %t6
    sta %t2
    lda [%t0]
    tax
    ldy #2
    lda [%t0],y
    stx %r10
    sta %r12
    __bra .414
    ; next set = { 414 }

.413
    lda [%r0]
    tax
    ldy #2
    lda [%r0],y
    stx %t0
    sta %t2
    lda [%t0]
    tax
    ldy #2
    lda [%t0],y
    stx %r10
    sta %r12
    ;next set = { 414 }

.414
    lda %r10
    sta %r4
    lda %r12
    sta %r4+2
    __bra .411
    ; prev set = { 412, 413 }
    ; next set = { 411 }

If blocks 412 and 413 look similar, it’s because the last 7 lines (14 bytes) are identical. Since they both flow into 414 (which has no other previous blocks), those lines could be moved to the start of 414. After other optimizations, this results in an 8-byte improvement.

.412
    ...
    sta %t0
    lda %t2
    adc %t6
    __bra .414
    ; next set = { 414 }

.413
    lda [%r0]
    tax
    ldy #2
    lda [%r0],y
    stx %t0
    ;next set = { 414 }

.414
    sta %t2
    lda [%t0]
    tax
    ldy #2
    lda [%t0],y
    stx %r10
    sta %r12
    lda %r10
    sta %r4
    lda %r12
    sta %r4+2
    __bra .411
    ; prev set = { 412, 413 }
    ; next set = { 411 }

(This optimization is entirely theoretical and has not been implemented as of the publication date.)