ASP68K PROJECT, Sixth Edition
ASP68K PROJECT
Sixth Edition
by Michael Glew
mglew@laurel.ocs.mq.edu.au
Technophilia BBS +61-2-8073563
January 1994
---------------------------------------------------------------------------
C O N T R I B U T O R S
---------------------------------------------------------------------------
Erik Bakke, Robert Barton, Bernd Blank, Kasimir Blomstedt, Frans Bouma,
David Carson, Nicolas Dade, Aaron Digulla, Irmen de Jong, Andy Duplain,
Denis Duplan, Steven Eker, Calle Englund, Alexander Fritsch, Charlie Gibbs,
Kurt Haenen, Jon Hudson, Kjetil Jacobsen, Olav Kalgraf, Makoto Kamada,
Markku Kolkka, John Lane, Jonathan Mahaffy, Dave Mc Mahan, Lindsay Meek,
Walter Misar, Boerge Noest, Gunnar Rxnning, Jay Scott, Olaf Seibert,
Peter Simons.
---------------------------------------------------------------------------
I N T R O D U C T I O N
---------------------------------------------------------------------------
A while back, I was quite interested to find that there was an electronic
magazine called "howtocode" that included lots of interesting hints and
tips of coding. In the fifth edition, there was a list of optimizations
that really got be thinking. "What if there was a proggy that you could
put an assembler program through, that would speed it up, taking out all
the stupid things output by compilers, and over-tired coders?" 8). I
started combing the networks, and came across one such program, called
the "SELCO Source Optimizer". It only had four optimizations, so I set
to writing my own.
Step one was to collect as many optimization ideas as I could. I posted
to Usenet and got an impressive response, and the contributors are listed
above. I promised a report on the optimizations recieved, and here it
is. My aim now is to write a program to make these optimizations, and
to distribute it. Contributers will recieve a copy of the final archive,
to thank them for their time and energy. Further contributions will be
welcomed, so rather than making changes yourself tell me what you want
changed, and i'll distribute it with the next update.
---------------------------------------------------------------------------
C H A N G E S
---------------------------------------------------------------------------
2nd Edition
The second edition incorporated a hell of a lot of corrections. Double
copies of some optimizations were incorporated in to just one copy, and
a few additions were made. Sorry that the first edition was not sent
out to all contributors, but I was a tad busy. 8)
3rd Edition
Due to the distribution of the second edition document, many comments were
recieved and a couple of the "optimizations" were found to be incorrect.
Analysis of the mul/div optimizations ended in a few modifications for
safety. They still save a huge number of clock cycles, so it is better to
be safe than sorry.
Also, I have made it so that the number of words of space saved or
increased is shown. Space savings are positive, increases are negative.
Zero means no change.
4th Edition
Some minor changes and additions as well as the addition of columns for
'030 and '040 CPUs - whole new format was required...
5th Edition
Eric Bakke released his docs on 020+ CPUs and 881/882 FPUs. I have been
given premission to use these docs to further the capabilities of asp68k.
Thanks Eric... I really would like to get a hold of the 020,030,040
Programmer Reference Cards or manuals, so if anyone has any copies they
wanna send me, let me know... Local Motorola Distributers are not too
helpful.
6th Edition
Aaron Digulla advised that it would be helpful if the optimizations were
sorted somehow. I will sort by the the first letters of the first line
of the optimizations. Also a special thanks to Makoto Kamada for his
detailed contributions, without such this text would have died long ago..
---------------------------------------------------------------------------
O P T I M I Z A T I O N S
---------------------------------------------------------------------------
Note:-
m? = memory operand
dx = data register
ds = data register (scratch)
ax = address register
rx = either a data or address register
#n = immediate operand
??,?1,?2= address label
* = anything
.x = any size
b<cc> = branch commands
Opt = optimization
Notes = notes about where optimization is valid, and misc notes
Speed = are clock periods saved? ("Y" = yes
"y" = in some cases
"N" = no
"*" = increase
"-" = cannot be used on this cpu
"!" = must be used on this cpu)
Size = how many bytes are saved?
-------------------------------------------------------------
Opt Speed Size
000 010 020 030 040
------------------------------------+---+---+---+---+---+----
* ??* -> * n(pc)* | Y | Y | ? | ? | ? | 2
------------------------------------+---+---+---+---+---+----
n = ??-pc, n < 32768
------------------------------------+---+---+---+---+---+----
*0(ax)* -> *(ax)* | Y | Y | ? | ? | ? | 2
------------------------------------+---+---+---+---+---+----
add*.x #0,dx -> tst.x dx | Y | Y | ? | ? | ? | 2/4
------------------------------------+---+---+---+---+---+----
add.x #n,* -> addq.x #n,* | Y | Y | ? | ? | ? | 2/4
-----------------------------+---+---+---+---+---+----
bra ?? -> (nothing) | Y | Y | Y | ? | ? | 2/4
?? ?? | | | | | |
------------------------------------+---+---+---+---+---+----
remove null branches, but keep the label
------------------------------------+---+---+---+---+---+----
bset.b #7,m? -> tas m? | y | y | ? | ? | ? | 2
beq ?? bpl ?? | | | | | |
------------------------------------+---+---+---+---+---+----
m? must be address allowing read-modify-write transfer.
Status flags are wrong
------------------------------------+---+---+---+---+---+----
bset.b #7,m? -> tas m? | y | y | ? | ? | ? | 2
bne ?? bmi ?? | | | | | |
------------------------------------+---+---+---+---+---+----
m? must be address allowing read-modify-write transfer.
Status flags are wrong
------------------------------------+---+---+---+---+---+----
bset.b #7,m? -> tas m? | y | y | ? | ? | ? | 2
------------------------------------+---+---+---+---+---+----
m? must be address allowing read-modify-write transfer.
Status flags are wrong
------------------------------------+---+---+---+---+---+----
bset.l #7,dx -> tas dx | Y | Y | ? | Y | ? | 2
beq ?? bpl ?? | | | | | |
------------------------------------+---+---+---+---+---+----
status flags are wrong
------------------------------------+---+---+---+---+---+----
bset.l #7,dx -> tas dx | Y | Y | ? | Y | ? | 2
bne ?? bmi ?? | | | | | |
------------------------------------+---+---+---+---+---+----
status flags are wrong
------------------------------------+---+---+---+---+---+----
bset.l #7,dx -> tas dx | Y | Y | ? | Y | ? | 2
------------------------------------+---+---+---+---+---+----
status flags are wrong
------------------------------------+---+---+---+---+---+----
bset.l #n,dx -> or.w #m,dx | Y | Y | ? | Y | ? | 0
------------------------------------+---+---+---+---+---+----
0 <= n <= 15, m = 2^n
status flags are wrong
------------------------------------+---+---+---+---+---+----
bsr ?? -> bra ?? | Y | Y | ? | ? | ? | 2
rts | | | | | |
------------------------------------+---+---+---+---+---+----
different stack depth
------------------------------------+---+---+---+---+---+----
btst.b #7,m? -> tst.b m? | Y | Y | ? | ? | ? | 2
beq ?? bpl ?? | | | | | |
------------------------------------+---+---+---+---+---+----
Status flags are wrong. Not valid for Dn, d16(PC), d8(PC,Xn)
dest address modes.
------------------------------------+---+---+---+---+---+----
btst.b #7,m? -> tst.b m? | Y | Y | ? | ? | ? | 2
bne ?? bmi ?? | | | | | |
------------------------------------+---+---+---+---+---+----
Status flags are wrong. Not valid for Dn, d16(PC), d8(PC,Xn)
dest address modes.
------------------------------------+---+---+---+---+---+----
btst.l #7,dx -> tst.b dx | Y | Y | ? | Y | ? | 2
beq ?? bpl ?? | | | | | |
------------------------------------+---+---+---+---+---+----
Status flags are wrong.
------------------------------------+---+---+---+---+---+----
btst.l #7,dx -> tst.b dx | Y | Y | ? | Y | ? | 2
bne ?? bmi ?? | | | | | |
------------------------------------+---+---+---+---+---+----
Status flags are wrong.
------------------------------------+---+---+---+---+---+----
btst.l #15,dx -> tst.w dx | Y | Y | ? | Y | ? | 2
beq ?? bpl ?? | | | | | |
------------------------------------+---+---+---+---+---+----
Status flags are wrong.
------------------------------------+---+---+---+---+---+----
btst.l #15,dx -> tst.w dx | Y | Y | ? | Y | ? | 2
bne ?? bmi ?? | | | | | |
------------------------------------+---+---+---+---+---+----
Status flags are wrong.
------------------------------------+---+---+---+---+---+----
btst.l #31,dx -> tst.l dx | Y | Y | ? | Y | ? | 2
beq ?? bpl ?? | | | | | |
------------------------------------+---+---+---+---+---+----
Status flags are wrong.
------------------------------------+---+---+---+---+---+----
btst.l #31,dx -> tst.l dx | Y | Y | ? | Y | ? | 2
bne ?? bmi ?? | | | | | |
------------------------------------+---+---+---+---+---+----
status flags are wrong
------------------------------------+---+---+---+---+---+----
clr.b mn -> clr.w mn | Y | Y | ? | ? | ? | 2/4/6
clr.b mn+1 | | | | | |
------------------------------------+---+---+---+---+---+----
best if mn is longword aligned
------------------------------------+---+---+---+---+---+----
clr.l dx -> moveq #0,dx | Y | Y | ? | ? | ? | 0
------------------------------------+---+---+---+---+---+----
clr.w mn -> clr.l mn | Y | Y | ? | ? | ? | 2/4/6
clr.w mn+2 | | | | | |
------------------------------------+---+---+---+---+---+----
best if mn is longword aligned
------------------------------------+---+---+---+---+---+----
clr.x -(ax) -> move.x ds,-(ax) | Y | Y | ? | ? | ? | 0
------------------------------------+---+---+---+---+---+----
ds must equal zero
------------------------------------+---+---+---+---+---+----
clr.x n(ax,rx) -> move.x ds,n(ax,rx)| Y | Y | ? | ? | ? | 0
------------------------------------+---+---+---+---+---+----
ds must equal zero
------------------------------------+---+---+---+---+---+----
cmp.x #0,ax -> move.x ax,ds | Y | Y | ? | ? | ? | 2/4
------------------------------------+---+---+---+---+---+----
move ax to scratch register
------------------------------------+---+---+---+---+---+----
cmp.x #0,ax -> tst.x ax | - | - | ? | ? | ? | ?
------------------------------------+---+---+---+---+---+----
for .w and .l
------------------------------------+---+---+---+---+---+----
cmp.x #0,dx -> tst.x dx | Y | Y | ? | ? | ? | 2/4
------------------------------------+---+---+---+---+---+----
cmp.x #0,m? -> tst.x m? | Y | Y | ? | ? | ? | 2/4
------------------------------------+---+---+---+---+---+----
may not be legal on some early '000 CPUs
------------------------------------+---+---+---+---+---+----
divu.l #n,dx -> lsr.l #m,dx | ! | ! | ? | ? | ? | 4
------------------------------------+---+---+---+---+---+----
n is 2^m, 1 <= m <= 8
------------------------------------+---+---+---+---+---+----
divu.l #n,dx -> moveq #0,dx | ! | ! | ? | ? | ? | 4
------------------------------------+---+---+---+---+---+----
n is 2^m, m>=32
------------------------------------+---+---+---+---+---+----
divu.l #n,dx -> moveq #m,ds | ! | ! | ? | ? | ? | 2
lsr.l ds,dx | | | | | |
------------------------------------+---+---+---+---+---+----
n is 2^m, 8<m<32
------------------------------------+---+---+---+---+---+----
divu.w #n,dx -> lsr.l #m,dx | Y | Y | ? | ? | ? | 2
------------------------------------+---+---+---+---+---+----
n is 2^m, 1 <= m <= 8, ignore remainder
------------------------------------+---+---+---+---+---+----
divu.w #n,dx -> moveq #0,dx | Y | Y | ? | ? | ? | 2
------------------------------------+---+---+---+---+---+----
n is 2^m, m>=32
------------------------------------+---+---+---+---+---+----
divu.w #n,dx -> moveq #m,ds | Y | Y | ? | ? | ? | 0
lsr.l ds,dx | | | | | |
------------------------------------+---+---+---+---+---+----
n is 2^m, 8<m<32, ignore remainder
------------------------------------+---+---+---+---+---+----
eor.x #-1,* -> not.x * | Y | Y | ? | ? | ? | 2/4
------------------------------------+---+---+---+---+---+----
ext.w dx -> extb.l dx | - | - | ? | ? | ? | 2
ext.l dx | | | | | |
------------------------------------+---+---+---+---+---+----
jmp ?? -> bra.w ?? | Y | Y | ? | ? | ? | 2
------------------------------------+---+---+---+---+---+----
abs(??-pc) < 32768, same section
------------------------------------+---+---+---+---+---+----
jsr * -> jmp * | Y | Y | ? | ? | ? | 2
rts | | | | | |
------------------------------------+---+---+---+---+---+----
different stack depth
------------------------------------+---+---+---+---+---+----
jsr ?1 -> pea ?2 | y | y | ? | ? | ? | 0
jmp ?2 jmp ?1 | | | | | |
------------------------------------+---+---+---+---+---+----
same time if jsr is abs.l
------------------------------------+---+---+---+---+---+----
jsr ?? -> bsr.w ?? | Y | Y | ? | ? | ? | 2
------------------------------------+---+---+---+---+---+----
abs(??-pc) < 32768, same section
------------------------------------+---+---+---+---+---+----
lea (ax),ax -> (nothing) | Y | Y | Y | ? | ? | 2
------------------------------------+---+---+---+---+---+----
delete
------------------------------------+---+---+---+---+---+----
lea 0.w,ax -> sub.l ax,ax | Y | Y | - | - | - | 2
------------------------------------+---+---+---+---+---+----
lea n(ax),ax -> addq.w #n,ax | Y | Y | ? | ? | ? | 2
------------------------------------+---+---+---+---+---+----
if 1 <= n <= 8
------------------------------------+---+---+---+---+---+----
lea n(ax),ax -> subq.w #-n,ax | Y | Y | ? | ? | ? | 2
------------------------------------+---+---+---+---+---+----
if -8 <= n <= -1
------------------------------------+---+---+---+---+---+----
lsl.b #2,dy -> add.b dy,dy | Y | Y | ? | ? | ? | -2
add.b dy,dy | | | | | |
------------------------------------+---+---+---+---+---+----
lsl.b #n,dx -> clr.b dx | Y | Y | ? | ? | ? | 0
------------------------------------+---+---+---+---+---+----
status flags are wrong, n>=8
------------------------------------+---+---+---+---+---+----
lsl.l #16,dx -> swap dx | Y | Y | ? | ? | ? | -2
clr.w dx | | | | | |
------------------------------------+---+---+---+---+---+----
status flags are wrong
------------------------------------+---+---+---+---+---+----
lsl.l #n,dx -> lsl.w #(n-16),dx | Y | Y | ? | ? | ? | -4
swap dx | | | | | |
clr.w dx | | | | | |
------------------------------------+---+---+---+---+---+----
status flags are wrong, 16<n<32
------------------------------------+---+---+---+---+---+----
lsl.l #n,dx -> moveq #0,dx | Y | Y | ? | ? | ? | 0
------------------------------------+---+---+---+---+---+----
status flags are wrong, n>=32
------------------------------------+---+---+---+---+---+----
lsl.w #2,dy -> add.w dy,dy | Y | Y | ? | ? | ? | -2
add.w dy,dy | | | | | |
------------------------------------+---+---+---+---+---+----
lsl.w #n,dx -> clr.w dx | Y | Y | ? | ? | ? | 0
------------------------------------+---+---+---+---+---+----
status flags are wrong, n>=16
------------------------------------+---+---+---+---+---+----
lsl.x #1,dy -> add.x dy,dy | Y | Y | ? | ? | ? | 0
------------------------------------+---+---+---+---+---+----
lsr.b #n,dx -> clr.b dx | Y | Y | ? | ? | ? | 0
------------------------------------+---+---+---+---+---+----
status flags are wrong, n>=8
------------------------------------+---+---+---+---+---+----
lsr.l #16,dx -> clr.w dx | Y | Y | ? | ? | ? | -2
swap dx | | | | | |
------------------------------------+---+---+---+---+---+----
status flags are wrong
------------------------------------+---+---+---+---+---+----
lsr.l #n,dx -> clr.w dx | Y | Y | ? | ? | ? | -4
swap dx | | | | | |
lsr.w #(n-16),dx | | | | | |
------------------------------------+---+---+---+---+---+----
status flags are wrong, 16<n<32
------------------------------------+---+---+---+---+---+----
lsr.l #n,dx -> moveq #0,dx | Y | Y | ? | ? | ? | 0
------------------------------------+---+---+---+---+---+----
status flags are wrong, n>=32
------------------------------------+---+---+---+---+---+----
lsr.w #n,dx -> clr.w dx | Y | Y | ? | ? | ? | 0
------------------------------------+---+---+---+---+---+----
status flags are wrong, n>=16
------------------------------------+---+---+---+---+---+----
move.b #-1,(ax) -> st (ax) | Y | Y | ? | ? | ? | 2
------------------------------------+---+---+---+---+---+----
status flags are wrong
------------------------------------+---+---+---+---+---+----
move.b #-1,(ax)+ -> st (ax)+ | N | N | ? | ? | ? | 2
------------------------------------+---+---+---+---+---+----
status flags are wrong
------------------------------------+---+---+---+---+---+----
move.b #-1,-(ax) -> st -(ax) | N | N | ? | ? | ? | 2
------------------------------------+---+---+---+---+---+----
status flags are wrong
------------------------------------+---+---+---+---+---+----
move.b #-1,?? -> st ?? | Y | Y | ? | ? | ? | 2
------------------------------------+---+---+---+---+---+----
status flags are wrong
------------------------------------+---+---+---+---+---+----
move.b #-1,dx -> st dx | Y | Y | ? | ? | ? | 2
------------------------------------+---+---+---+---+---+----
status flags are wrong
------------------------------------+---+---+---+---+---+----
move.b #-1,n(ax) -> st n(ax) | Y | Y | ? | ? | ? | 2
------------------------------------+---+---+---+---+---+----
status flags are wrong
------------------------------------+---+---+---+---+---+----
move.b #-1,n(ax,rx) -> st n(ax,rx) | Y | Y | ? | ? | ? | 2
------------------------------------+---+---+---+---+---+----
status flags are wrong
------------------------------------+---+---+---+---+---+----
move.b #x,mn -> move.w #xy,mn | Y | Y | ? | ? | ? | 4/6/8
move.b #y,mn+1 | | | | | |
------------------------------------+---+---+---+---+---+----
best if mn is longword aligned
------------------------------------+---+---+---+---+---+----
move.l #n,-(sp) -> pea n.w | Y | Y | ? | ? | ? | 2
------------------------------------+---+---+---+---+---+----
-32767 <= n <= 32767
------------------------------------+---+---+---+---+---+----
move.l #n,ax -> move.w #n,ax | Y | Y | ? | ? | ? | 2
------------------------------------+---+---+---+---+---+----
-32767 <= n <= 32767
------------------------------------+---+---+---+---+---+----
move.l #n,dx -> moveq #-128,dx | Y | Y | ? | N | * | 2
subq.l #n+128,dx | | | | | |
------------------------------------+---+---+---+---+---+----
-136 <= n <= -129
------------------------------------+---+---+---+---+---+----
move.l #n,dx -> moveq #m,dx | Y | Y | ? | ? | ? | 2
not.b dx | | | | | |
------------------------------------+---+---+---+---+---+----
128 <= n <= 255, m = 255-n
------------------------------------+---+---+---+---+---+----
move.l #n,dx -> moveq #m,dx | Y | Y | ? | ? | ? | 2
not.w dx | | | | | |
| | | | | |
------------------------------------+---+---+---+---+---+----
65534 <= n <= 65408 or -65409 <= n <= -65536, m = 65535-abs(n)
------------------------------------+---+---+---+---+---+----
move.l #n,dx -> moveq #m,dx | Y | Y | ? | ? | ? | 2
swap dx | | | | | |
| | | | | |
------------------------------------+---+---+---+---+---+----
-8323073 <= n <= -65537 or 4096 <= n <= 8323072, n = m*65536
------------------------------------+---+---+---+---+---+----
move.l #n,dx -> moveq #n,dx | Y | Y | ? | ? | ? | 4
------------------------------------+---+---+---+---+---+----
if -128 <= n <= 127
------------------------------------+---+---+---+---+---+----
move.l #n,dx -> moveq #y,dx | * | * | ? | ? | ? | 2
lsl.l #z,dx | | | | | |
------------------------------------+---+---+---+---+---+----
n = y * 2^z
------------------------------------+---+---+---+---+---+----
move.l #n,dx -> moveq #m,dx | Y | Y | ? | N | ? | 2
add.b dx,dx | | | | | |
------------------------------------+---+---+---+---+---+----
(128 <= n <= 254 or -256 <= n <= -130) and n is even, m = n/2
------------------------------------+---+---+---+---+---+----
move.l #n,dx -> moveq #m,dx | Y | Y | ? | * | ? | 2
bchg.l dx,dx | | | | | |
------------------------------------+---+---+---+---+---+----
n = -32881 -> m = -113
n = -32849 -> m = -81
n = -32817 -> m = -49
n = -32785 -> m = -17
n = -16498 -> m = -114
n = -16466 -> m = -82
n = -16434 -> m = -50
n = -16402 -> m = -18
n = -8307 -> m = -115
n = -8275 -> m = -83
n = -8243 -> m = -51
n = -8211 -> m = -19
n = -4212 -> m = -116
n = -4180 -> m = -84
n = -4148 -> m = -52
n = -4116 -> m = -20
n = -2165 -> m = -117
n = -2133 -> m = -85
n = -2101 -> m = -53
n = -2069 -> m = -21
n = -1142 -> m = -118
n = -1110 -> m = -86
n = -1078 -> m = -54
n = -1046 -> m = -22
n = -631 -> m = -119
n = -599 -> m = -87
n = -567 -> m = -55
n = -535 -> m = -23
n = -376 -> m = -120
n = -344 -> m = -88
n = -312 -> m = -56
n = -280 -> m = -24
n = 264 -> m = 8
n = 296 -> m = 40
n = 328 -> m = 72
n = 360 -> m = 104
n = 521 -> m = 9
n = 553 -> m = 41
n = 585 -> m = 73
n = 617 -> m = 105
n = 1034 -> m = 10
n = 1066 -> m = 42
n = 1098 -> m = 74
n = 1130 -> m = 106
n = 2059 -> m = 11
n = 2091 -> m = 43
n = 2123 -> m = 75
n = 2155 -> m = 107
n = 4108 -> m = 12
n = 4140 -> m = 44
n = 4172 -> m = 76
n = 4204 -> m = 108
n = 8205 -> m = 13
n = 8237 -> m = 45
n = 8269 -> m = 77
n = 8301 -> m = 109
n = 16398 -> m = 14
n = 16430 -> m = 46
n = 16462 -> m = 78
n = 16494 -> m = 110
n = 32783 -> m = 15
n = 32815 -> m = 47
n = 32847 -> m = 79
n = 32879 -> m = 111
------------------------------------+---+---+---+---+---+----
move.l #n,dx -> moveq #m,dx | N | N | ? | * | ? | 2
bchg.l dx,dx | | | | | |
------------------------------------+---+---+---+---+---+----
n = -2147483617 -> m = 31
n = -2147483585 -> m = 63
n = -2147483553 -> m = 95
n = -2147483521 -> m = 127
n = -1073741922 -> m = -98
n = -1073741890 -> m = -66
n = -1073741858 -> m = -34
n = -1073741826 -> m = -2
n = -536871011 -> m = -99
n = -536870979 -> m = -67
n = -536870947 -> m = -35
n = -536870915 -> m = -3
n = -268435556 -> m = -100
n = -268435524 -> m = -68
n = -268435492 -> m = -36
n = -268435460 -> m = -4
n = -134217829 -> m = -101
n = -134217797 -> m = -69
n = -134217765 -> m = -37
n = -134217733 -> m = -5
n = -67108966 -> m = -102
n = -67108934 -> m = -70
n = -67108902 -> m = -38
n = -67108870 -> m = -6
n = -33554535 -> m = -103
n = -33554503 -> m = -71
n = -33554471 -> m = -39
n = -33554439 -> m = -7
n = -16777320 -> m = -104
n = -16777288 -> m = -72
n = -16777256 -> m = -40
n = -16777224 -> m = -8
n = -8388713 -> m = -105
n = -8388681 -> m = -73
n = -8388649 -> m = -41
n = -8388617 -> m = -9
n = -4194410 -> m = -106
n = -4194378 -> m = -74
n = -4194346 -> m = -42
n = -4194314 -> m = -10
n = -2097259 -> m = -107
n = -2097227 -> m = -75
n = -2097195 -> m = -43
n = -2097163 -> m = -11
n = -1048684 -> m = -108
n = -1048652 -> m = -76
n = -1048620 -> m = -44
n = -1048588 -> m = -12
n = -524397 -> m = -109
n = -524365 -> m = -77
n = -524333 -> m = -45
n = -524301 -> m = -13
n = -262254 -> m = -110
n = -262222 -> m = -78
n = -262190 -> m = -46
n = -262158 -> m = -14
n = -131183 -> m = -111
n = -131151 -> m = -79
n = -131119 -> m = -47
n = -131087 -> m = -15
n = -65648 -> m = -112
n = -65616 -> m = -80
n = -65584 -> m = -48
n = -65552 -> m = -16
n = 65552 -> m = 16
n = 65584 -> m = 48
n = 65616 -> m = 80
n = 65648 -> m = 112
n = 131089 -> m = 17
n = 131121 -> m = 49
n = 131153 -> m = 81
n = 131185 -> m = 113
n = 262162 -> m = 18
n = 262194 -> m = 50
n = 262226 -> m = 82
n = 262258 -> m = 114
n = 524307 -> m = 19
n = 524339 -> m = 51
n = 524371 -> m = 83
n = 524403 -> m = 115
n = 1048596 -> m = 20
n = 1048628 -> m = 52
n = 1048660 -> m = 84
n = 1048692 -> m = 116
n = 2097173 -> m = 21
n = 2097205 -> m = 53
n = 2097237 -> m = 85
n = 2097269 -> m = 117
n = 4194326 -> m = 22
n = 4194358 -> m = 54
n = 4194390 -> m = 86
n = 4194422 -> m = 118
n = 8388631 -> m = 23
n = 8388663 -> m = 55
n = 8388695 -> m = 87
n = 8388727 -> m = 119
n = 16777240 -> m = 24
n = 16777272 -> m = 56
n = 16777304 -> m = 88
n = 16777336 -> m = 120
n = 33554457 -> m = 25
n = 33554489 -> m = 57
n = 33554521 -> m = 89
n = 33554553 -> m = 121
n = 67108890 -> m = 26
n = 67108922 -> m = 58
n = 67108954 -> m = 90
n = 67108986 -> m = 122
n = 134217755 -> m = 27
n = 134217787 -> m = 59
n = 134217819 -> m = 91
n = 134217851 -> m = 123
n = 268435484 -> m = 28
n = 268435516 -> m = 60
n = 268435548 -> m = 92
n = 268435580 -> m = 124
n = 536870941 -> m = 29
n = 536870973 -> m = 61
n = 536871005 -> m = 93
n = 536871037 -> m = 125
n = 1073741854 -> m = 30
n = 1073741886 -> m = 62
n = 1073741918 -> m = 94
n = 1073741950 -> m = 126
n = 2147483551 -> m = -97
n = 2147483583 -> m = -65
n = 2147483615 -> m = -33
n = 2147483647 -> m = -1
------------------------------------+---+---+---+---+---+----
move.l #n,m? -> moveq #n,ds | Y | Y | ? | ? | ? | 2
move.l ds,m? | | | | | |
------------------------------------+---+---+---+---+---+----
-128 <= n <= 127
------------------------------------+---+---+---+---+---+----
move.l (ax),ay -> move.x ([ax],n),dz| - | - | ? | ? | ? | ?
move.x n(ay),dz | | | | | |
------------------------------------+---+---+---+---+---+----
move.l (ax),ay -> move.x ([ax]),dz | - | - | ? | ? | ? | ?
move.x (ay),dz | | | | | |
------------------------------------+---+---+---+---+---+----
move.l (bd.x,ax),dy -> | - | - | ? | ? | ? | ?
move.l bd.x,dy | | | | | |
------------------------------------+---+---+---+---+---+----
move.l (n.w,ax),dy -> | - | - | ? | ? | ? | ?
move.l n(ax),dy | | | | | |
------------------------------------+---+---+---+---+---+----
move.l (sp),(n,sp) -> rtd #n | - | - | ? | ? | ? | ?
lea (n,sp),sp | | | | | |
rts | | | | | |
------------------------------------+---+---+---+---+---+----
move.l (sp),0(dx,sp) -> rtd dx | - | Y | ? | ? | ? | 6
lea 0(dx,sp),sp | | | | | |
rts | | | | | |
------------------------------------+---+---+---+---+---+----
move.l 12(ax),12(ay) -> move16 | - | - | - | - | ? | ?
move.l 8(ax),8(ay) (ax)+,(ay)+ | | | | | |
move.l 4(ax),4(ay) | | | | | |
move.l (ax)+,(ay)+ | | | | | |
------------------------------------+---+---+---+---+---+----
move.l ax,-(sp) -> link ax,#n | Y | Y | ? | ? | ? | 4
move.l sp,ax | | | | | |
add.w #n,sp | | | | | |
------------------------------------+---+---+---+---+---+----
-32767 <= n <= 32767
------------------------------------+---+---+---+---+---+----
move.l ax,-(sp) -> pea -n(ax) | Y | Y | ? | ? | ? | 0/4
sub*.l #n,(sp) | | | | | |
------------------------------------+---+---+---+---+---+----
move.l ax,-(sp) -> pea n(ax) | Y | Y | ? | ? | ? | 0/4
add*.l #n,(sp) | | | | | |
------------------------------------+---+---+---+---+---+----
move.l ax,az -> lea n(ax.l*4),az | - | - | ? | ? | ? | ?
asl.l #2,az | | | | | |
add.x #n,az | | | | | |
------------------------------------+---+---+---+---+---+----
az=n+4*ax, -128<=n<=127
------------------------------------+---+---+---+---+---+----
move.l ax,az -> lea n(ax.l*8),az | - | - | ? | ? | ? | ?
asl.l #3,az | | | | | |
add.x #n,az | | | | | |
------------------------------------+---+---+---+---+---+----
az=n+8*ax, -32767<=n<=32767
------------------------------------+---+---+---+---+---+----
move.l ax,sp -> unlk ax | Y | Y | ? | ? | ? | 2
move.l (sp)+,ax | | | | | |
------------------------------------+---+---+---+---+---+----
move.l ay,az -> lea n(ax,ay.l*4),az | - | - | ? | ? | ? | ?
asl.l #2,az | | | | | |
add.l ax,az | | | | | |
add.x #n,az | | | | | |
------------------------------------+---+---+---+---+---+----
az=n+ax+4*ay, -32767<=n<=32767
------------------------------------+---+---+---+---+---+----
move.l ay,az -> lea n(ax,ay.l*8),az | - | - | ? | ? | ? | ?
asl.l #3,az | | | | | |
add.l ax,az | | | | | |
add.x #n,az | | | | | |
------------------------------------+---+---+---+---+---+----
az=n+ax+8*ay, -32767<=n<=32767
------------------------------------+---+---+---+---+---+----
move.w #x,mn -> move.l #xy,mn | Y | Y | ? | ? | ? | 2/4/6
move.w #y,mn+2 | | | | | |
------------------------------------+---+---+---+---+---+----
best if mn is longword aligned
------------------------------------+---+---+---+---+---+----
move.x #0,ax -> sub.l ax,ax | Y | Y | ? | ? | ? | 2/4
------------------------------------+---+---+---+---+---+----
move.x #n,ax -> lea n,ax | Y | Y | ? | ? | ? | 0
------------------------------------+---+---+---+---+---+----
n <> 0
------------------------------------+---+---+---+---+---+----
move.x (rx,ay),az -> move.x ay,az | Y | Y | ? | ? | ? | 0
add.x rx,az | | | | | |
------------------------------------+---+---+---+---+---+----
move.x ax,ay -> lea n(ax),ay | Y | Y | ? | ? | ? | 2/4
add.x #n,ay | | | | | |
------------------------------------+---+---+---+---+---+----
-32767 <= n <= 32767
------------------------------------+---+---+---+---+---+----
move.x ax,az -> lea -n(ax,ay),az | Y | Y | ? | ? | ? | 2
sub.x #n,az | | | | | |
add.x ay,az | | | | | |
------------------------------------+---+---+---+---+---+----
az=n+ax+ay, n<=32767
------------------------------------+---+---+---+---+---+----
move.x ax,az -> lea n(ax,ay),az | Y | Y | ? | ? | ? | 2
add.x #n,az | | | | | |
add.x ay,az | | | | | |
------------------------------------+---+---+---+---+---+----
az=n+ax+ay, n<=32767
------------------------------------+---+---+---+---+---+----
movem.l (ax)+,registers | * | * | ? | ? | Y | *
-> move.l (ax)+,ry | | | | | |
for each reg | | | | | |
------------------------------------+---+---+---+---+---+----
movem.w *,dx -> move.w *,dx | Y | Y | ? | ? | ? | 0
ext.l dx | | | | | |
------------------------------------+---+---+---+---+---+----
movem.x *,@ -> move.x *,@ | Y | Y | ? | ? | ? | 2
| | | | | |
------------------------------------+---+---+---+---+---+----
@ = a single register, not (@=dx & .x=.w)
------------------------------------+---+---+---+---+---+----
movem.x @,* -> move.x @,* | Y | Y | ? | ? | ? | 2
| | | | | |
------------------------------------+---+---+---+---+---+----
@ = a single register, status flags are wrong
------------------------------------+---+---+---+---+---+----
moveq #n,az -> lea n(ax,ay.l*2),az | - | - | ? | ? | ? | ?
add.x ay,az | | | | | |
add.x ax,az | | | | | |
add.x ay,az | | | | | |
------------------------------------+---+---+---+---+---+----
az=n+ax+2*ay, -128<=n<=127
------------------------------------+---+---+---+---+---+----
mul*.l #1,dx -> (nothing) | ! | ! | Y | Y | Y | 6
------------------------------------+---+---+---+---+---+----
delete
------------------------------------+---+---+---+---+---+----
mul*.l #10,dx -> add.l dx,dx | ! | ! | ? | ? | ? | -2
move.l dx,ds | | | | | |
asl.l #2,dx | | | | | |
add.l ds,dx | | | | | |
------------------------------------+---+---+---+---+---+----
mul*.l #12,dx -> asl.l #2,dx | ! | ! | ? | ? | ? | -2
move.l dx,ds | | | | | |
add.l dx,dx | | | | | |
add.l ds,dx | | | | | |
------------------------------------+---+---+---+---+---+----
mul*.l #2,dx -> add.l dx,dx | ! | ! | ? | ? | ? | 4
------------------------------------+---+---+---+---+---+----
mul*.l #3,dx -> move.l dx,ds | ! | ! | ? | ? | ? | 0
add.l dx,dx | | | | | |
add.l ds,dx | | | | | |
------------------------------------+---+---+---+---+---+----
mul*.l #5,dx -> move.l dx,ds | ! | ! | ? | ? | ? | 0
asl.l #2,dx | | | | | |
add.l ds,dx | | | | | |
------------------------------------+---+---+---+---+---+----
mul*.l #6,dx -> add.l dx,dx | ! | ! | ? | ? | ? | -2
move.l dx,ds | | | | | |
add.l dx,dx | | | | | |
add.l ds,dx | | | | | |
------------------------------------+---+---+---+---+---+----
mul*.l #7,dx -> move.l dx,ds | ! | ! | ? | ? | ? | 0
asl.l #3,dx | | | | | |
sub.l ds,dx | | | | | |
------------------------------------+---+---+---+---+---+----
mul*.l #9,dx -> move.l dx,ds | ! | ! | ? | ? | ? | 0
asl.l #3,dx | | | | | |
add.l ds,dx | | | | | |
------------------------------------+---+---+---+---+---+----
mul*.l #n,dx -> moveq #m,ds | ! | ! | ? | ? | ? | 2
asl.l ds,dx | | | | | |
------------------------------------+---+---+---+---+---+----
n is 2^m, 8<m<14
------------------------------------+---+---+---+---+---+----
muls.l #0,dx -> moveq #0,dx | ! | ! | ? | ? | ? | 4
------------------------------------+---+---+---+---+---+----
muls.l #n,dx -> asl.l #m,dx | ! | ! | ? | ? | ? | 4
------------------------------------+---+---+---+---+---+----
n is 2^m, 1 <= m <= 8
------------------------------------+---+---+---+---+---+----
muls.w #0,dx -> moveq #0,dx | Y | Y | ? | ? | ? | 2
------------------------------------+---+---+---+---+---+----
muls.w #1,dx -> ext.l dx | Y | Y | ? | ? | ? | 2
------------------------------------+---+---+---+---+---+----
muls.w #10,dx -> ext.l dx | Y | Y | ? | ? | ? | -6
add.l dx,dx | | | | | |
move.l dx,ds | | | | | |
asl.l #2,dx | | | | | |
add.l ds,dx | | | | | |
------------------------------------+---+---+---+---+---+----
muls.w #11,dx -> ext.l dx | Y | Y | ? | ? | ? | -8
move.l dx,ds | | | | | |
add.l dx,dx | | | | | |
add.l dx,ds | | | | | |
asl.l #3,dx | | | | | |
add.l ds,dx | | | | | |
------------------------------------+---+---+---+---+---+----
muls.w #12,dx -> ext.l dx | Y | Y | ? | ? | ? | -6
asl.l #2,dx | | | | | |
move.l dx,ds | | | | | |
add.l dx,dx | | | | | |
add.l ds,dx | | | | | |
------------------------------------+---+---+---+---+---+----
muls.w #2,dx -> ext.l dx | Y | Y | ? | ? | ? | 0
add.l dx,dx | | | | | |
------------------------------------+---+---+---+---+---+----
muls.w #3,dx -> ext.l dx | Y | Y | ? | ? | ? | -4
move.l dx,ds | | | | | |
add.l dx,dx | | | | | |
add.l ds,dx | | | | | |
------------------------------------+---+---+---+---+---+----
muls.w #5,dx -> ext.l dx | Y | Y | ? | ? | ? | -4
move.l dx,ds | | | | | |
asl.l #2,dx | | | | | |
add.l ds,dx | | | | | |
------------------------------------+---+---+---+---+---+----
muls.w #6,dx -> ext.l dx | Y | Y | ? | ? | ? | -6
add.l dx,dx | | | | | |
move.l dx,ds | | | | | |
add.l ds,dx | | | | | |
add.l ds,dx | | | | | |
------------------------------------+---+---+---+---+---+----
muls.w #7,dx -> ext.l dx | Y | Y | ? | ? | ? | -4
move.l dx,ds | | | | | |
asl.l #3,dx | | | | | |
sub.l ds,dx | | | | | |
------------------------------------+---+---+---+---+---+----
muls.w #9,dx -> ext.l dx | Y | Y | ? | ? | ? | -4
move.l dx,ds | | | | | |
asl.l #3,dx | | | | | |
add.l ds,dx | | | | | |
------------------------------------+---+---+---+---+---+----
muls.w #n,dx -> ext.l dx | Y | Y | ? | ? | ? | 0
asl.l #m,dx | | | | | |
------------------------------------+---+---+---+---+---+----
n is 2^m, 1 <= m <= 8
------------------------------------+---+---+---+---+---+----
muls.w #n,dx -> moveq #m,ds | Y | Y | ? | ? | ? | -2
ext.l dx | | | | | |
asl.l ds,dx | | | | | |
------------------------------------+---+---+---+---+---+----
n is 2^m, 8<m<14
------------------------------------+---+---+---+---+---+----
muls.w #n,dx -> swap dx | Y | Y | ? | ? | ? | -2
clr.w dx | | | | | |
asr.l #(16-m),dx | | | | | |
------------------------------------+---+---+---+---+---+----
n is 2^m, 8 <= m <= 15
------------------------------------+---+---+---+---+---+----
mulu.l #0,dx -> moveq #0,dx | ! | ! | ? | ? | ? | 4
------------------------------------+---+---+---+---+---+----
mulu.l #n,dx -> lsl.l #m,dx | ! | ! | ? | ? | ? | 4
------------------------------------+---+---+---+---+---+----
n is 2^m, 1 <= m <= ?
------------------------------------+---+---+---+---+---+----
mulu.w #0,dx -> moveq #0,dx | Y | Y | ? | ? | ? | 2
------------------------------------+---+---+---+---+---+----
mulu.w #1,dx -> swap dx | Y | Y | ? | ? | ? | -2
clr.w dx | | | | | |
swap dx | | | | | |
------------------------------------+---+---+---+---+---+----
mulu.w #12,dx -> swap dx | Y | Y | ? | ? | ? | -10
clr.w dx | | | | | |
swap dx | | | | | |
asl.l #2,dx | | | | | |
move.l dx,ds | | | | | |
add.l dx,dx | | | | | |
add.l ds,dx | | | | | |
------------------------------------+---+---+---+---+---+----
mulu.w #2,dx -> swap dx | Y | Y | ? | ? | ? | -4
clr.w dx | | | | | |
swap dx | | | | | |
add.l dx,dx | | | | | |
------------------------------------+---+---+---+---+---+----
mulu.w #3,dx -> swap dx | Y | Y | ? | ? | ? | -8
clr.w dx | | | | | |
swap dx | | | | | |
move.l dx,ds | | | | | |
add.l dx,dx | | | | | |
add.l ds,dx | | | | | |
------------------------------------+---+---+---+---+---+----
mulu.w #5,dx -> swap dx | Y | Y | ? | ? | ? | -8
clr.w dx | | | | | |
swap dx | | | | | |
move.l dx,ds | | | | | |
asl.l #2,dx | | | | | |
add.l ds,dx | | | | | |
------------------------------------+---+---+---+---+---+----
mulu.w #6,dx -> swap dx | Y | Y | ? | ? | ? | -10
clr.w dx | | | | | |
swap dx | | | | | |
add.l dx,dx | | | | | |
move.l dx,ds | | | | | |
add.l ds,dx | | | | | |
add.l ds,dx | | | | | |
------------------------------------+---+---+---+---+---+----
mulu.w #7,dx -> swap dx | Y | Y | ? | ? | ? | -8
clr.w dx | | | | | |
swap dx | | | | | |
move.l dx,ds | | | | | |
asl.l #3,dx | | | | | |
sub.l ds,dx | | | | | |
------------------------------------+---+---+---+---+---+----
mulu.w #9,dx -> swap dx | Y | Y | ? | ? | ? | -8
clr.w dx | | | | | |
swap dx | | | | | |
move.l dx,ds | | | | | |
asl.l #3,dx | | | | | |
add.l ds,dx | | | | | |
------------------------------------+---+---+---+---+---+----
mulu.w #n,dx -> swap dx | Y | Y | ? | ? | ? | -4
clr.w dx | | | | | |
swap dx | | | | | |
lsl.l #m,dx | | | | | |
------------------------------------+---+---+---+---+---+----
n is 2^m, 1 <= m <= 8
------------------------------------+---+---+---+---+---+----
mulu.w #n,dx -> swap dx | Y | Y | ? | ? | ? | -2
clr.w dx | | | | | |
lsr.l #(16-m),dx | | | | | |
------------------------------------+---+---+---+---+---+----
n is 2^m, 8 <= m <= 15
------------------------------------+---+---+---+---+---+----
neg.x dx -> add.x dx,dy | Y | Y | Y | ? | ? | 2
sub.x dx,dy | | | | | |
------------------------------------+---+---+---+---+---+----
dx is trashed
------------------------------------+---+---+---+---+---+----
neg.x dx -> eor.x #n-1,dx | Y | Y | ? | ? | ? | 2
add.x #n,dx | | | | | |
------------------------------------+---+---+---+---+---+----
n is 2^m, dx<n
------------------------------------+---+---+---+---+---+----
neg.x dx -> sub.x dx,dy | Y | Y | Y | ? | ? | 2
add.x dx,dy | | | | | |
------------------------------------+---+---+---+---+---+----
dx is trashed
------------------------------------+---+---+---+---+---+----
nop -> (nothing) | Y | Y | ? | ? | ? | 2
------------------------------------+---+---+---+---+---+----
remove nops
------------------------------------+---+---+---+---+---+----
or.l #n,dx -> bset.l #b,dx | Y | Y | ? | ? | ? | 2
------------------------------------+---+---+---+---+---+----
n = 2^b (only 1 bit set)
------------------------------------+---+---+---+---+---+----
sub*.x #0,dx -> tst.x dx | Y | Y | ? | ? | ? | 2/4
------------------------------------+---+---+---+---+---+----
sub.x #n,* -> addq.x #-n,* | Y | Y | ? | ? | ? | 2/4
------------------------------------+---+---+---+---+---+----
-8 <= n <= -1
------------------------------------+---+---+---+---+---+----
sub.x #n,* -> subq.x #n,* | Y | Y | ? | ? | ? | 2/4
------------------------------------+---+---+---+---+---+----
if 1 <= n <= 8
------------------------------------+---+---+---+---+---+----
sub.x #n,ax -> lea -n(ax),ax | Y | Y | ? | ? | ? | 0/2
------------------------------------+---+---+---+---+---+----
-32767 <= n <= -9, 9 <= n <= 32767
------------------------------------+---+---+---+---+---+----
subq.l #n,ax -> subq.w #n,ax | Y | Y | ? | ? | ? | 0
------------------------------------+---+---+---+---+---+----
subq.w #1,dx -> db<cc> dx,?? | y | y | ? | ? | ? | -2
b<cc> ?? b<cc> ?? | | | | | |
------------------------------------+---+---+---+---+---+----
if dx=0 then will be slower
------------------------------------+---+---+---+---+---+----
subq.w #1,dx -> dbf dx,?? | Y | Y | ? | ? | ? | -2
bra ?? bra ?? | | | | | |
------------------------------------+---+---+---+---+---+----
if dx=0 then will be slower
------------------------------------+---+---+---+---+---+----
tst.w dx -> dbra dx,?? | y | y | ? | ? | ? | 2
bne ?? | | | | | |
------------------------------------+---+---+---+---+---+----
dx will be trashed
------------------------------------+---+---+---+---+---+----
---------------------------------------------------------------------------
H I N T S & T I P S
---------------------------------------------------------------------------
This new section is for stuff that cannot be included in the above tables.
This can include pipelining optimizations and other stuff.
020+ Sequential memory accesses can cause pipeline stalls, so try and
rearrange code so memory accesses do not immediately follow each
other. The same problem occurs if an address register updated
in one line is accessed in the next line.
ALL Include small routines as macros, because inline routines will
be much faster, and in extreme cases smaller.
ALL If a subroutine is only called from one position, either move
it inline, or only use jmp/bra commands.
---------------------------------------------------------------------------
C O N C L U S I O N
---------------------------------------------------------------------------
There are the optimizations i've come up with so far. If you could check
what i've done, and report any errors, that would make this list better. I
only have so much time to spend on this, and many hands make light work.
Also, stats (and more optimizations) for 68020+ CPU's would be welcomed.
Currently this list is only for simple peephole optimization stuff, but I
will hopefully get around to more extensive optimizations. Pipeline
optimization is on the way, so look out. Any info on the 68020+ pipelines
would be appreciated.
Optimizations with ?question-marks? in the boxes next to them, I do not
have the data to check yet.
The latest version of the asp68k archive is available by anonymous ftp from
ftp.mq.edu.au in the /home/mglew/ directory or by calling Technophilia BBS
on +61 2 807 3563 (or (02) 807 3563 in Australia).
===========================================================================
EOF EOF EOF EOF EOF EOF EOF EOF EOF EOF EOF EOF EOF EOF EOF EOF EOF EOF EOF
===========================================================================
HTML Conversion by AG2HTML.pl V2.941126c, perl $RCSfile: optimizations.HTML,v $$Revision: 1.1 $$Date: 1999/09/14 21:14:11 $
Patch level: 36
& witbrock@cs.cmu.edu