Skip to content

Added __*shift_nz routines (for shifting by non-zero or a constant amount)#756

Draft
ZERICO2005 wants to merge 2 commits intomasterfrom
shift_non_zero
Draft

Added __*shift_nz routines (for shifting by non-zero or a constant amount)#756
ZERICO2005 wants to merge 2 commits intomasterfrom
shift_non_zero

Conversation

@ZERICO2005
Copy link
Contributor

@ZERICO2005 ZERICO2005 commented Mar 16, 2026

Similar in spirit to #755.
__*shift_nz may allow for a small speed optimization by optionally skipping a test for a shift-by-zero.

I know that Clang/LLVM has some functionality to detect if the shift amount is non-zero. So the compiler could be able to output __*shift_nz when applicable. __*shift_nz will be emitted either when the shift amount is constant, or the shift amount is a variable that is proven to not be zero.

Calling __*shift_nz with a shift amount of zero is undefined behavior.

Additionally, it is always safe to convert __*shift_nz back to __*shift.

Pros:

  • Small speed optimization
  • If you see call __*shift in the compiler output, then you can be almost certain that the shift amount is by a variable instead of a constant
  • If only __lshl_nz is used, then it might be possible to not link __lshl which would save 3 bytes (although this would need FASMG require to implement)

Cons:

  • Potential for human error in hand written assembly code.

Here, 3F + 1 is saved by skipping the check for a shift-by-zero in __lshl.

__lshl:
	inc	l
	dec	l
	ret	z	; shift by zero
__lshl_nz:
	push	bc
	ld	b, l
	ex	(sp), hl
.L.loop:
	add	hl, hl
	rla
	djnz	.L.loop
	ex	(sp), hl
	pop	bc
	ret

Note that __*shift_nz aliases __*shift if no optimizations are possible.

__llshru_nz:
__llshru:
; Suboptimal for large shift amounts
	push	af
	push	iy
	ld	iy, 0
	add	iy, sp
	ld	a, (iy + 9)
	or	a, a
	jr	z, .L.finish	; shift by zero
; we cannot place __llshru_nz: here
	push	de
	push	hl

Here is a list of routines where __*shift_nz is faster:

  • __bshl
  • __bshru
  • __bshrs
  • __lshl
  • __lshru
  • __lshrs
  • __i48shru
  • __i48shrs

or a, $10

call __lshru
call __lshru_nz
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proof: Shift amount should be 5-29, and the same shift amount was used in DJNZ (which would break if the shift amount could be zero)

ex (sp), hl
; shift is non-zero and [1, 11] in the non-UB case
call c, __llshl
call c, __llshl_nz
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proof: A is [1, 204] here, and call c, __llshl_nz will only call the function if the shift amount is less than 31.

ld c, a ; A is [1, 23]
; shift until the MSB of the mantissa is the LSB of the exponent
call __ishl
call __ishl_nz
Copy link
Contributor Author

@ZERICO2005 ZERICO2005 Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proof: rcf \ adc hl, hl was done prior to .L.subnormal, which means that __ictlz will return a value that is at least 1 since the LSB will be cleared.


ex (sp), hl ; (SP) = shift
call __llshru
call __llshru_nz
Copy link
Contributor Author

@ZERICO2005 ZERICO2005 Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proof: Shift amount is [1, 11], and the exact same shift amount was used for DJNZ, which would break if the shift amount were zero.

call __ictlz
ld c, a
call __ishl
call __ishl_nz
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proof: add hl, hl is done, meaning that the LSB is 0, so __ictlz will return a value greater than 1

ld d, c ; store C
ld c, a
call __ishl
call __ishl_nz
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proof: Since sub a, 23 set the carry flag, it implies that A became [-23, -1], then neg makes A [1, 23]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

1 participant