What I do is something like this -
MAIN CSECT
USING *,12
LA 15,SAVEAREA
ST 13,4(,15)
ST 15,8(,13)
LR 13,15
BAS 14,SUBR1
BAS 14,SUBR2
L 13,4(,13)
SR 15,15
IC 15,RC
RETURN (14,12)
EJECT
SUBR1 BASR 15,0
SAVE (14,3),,SUBR1
LA 15,72(,13)
ST 13,4(,15)
ST 15,8(,13)
LR 13,15
MVI RC,16
L 13,4(,13)
RETURN (14,3)
EJECT
SUBR2 BASR 15,0
SAVE (14,5),,SUBR2
LA 15,72(,13)
ST 13,4(,15)
ST 15,8(,13)
LR 13,15
BAS 14,SUBR1
L 13,4(,13)
RETURN (14,5)
SAVEAREA DC (5*18)F'0'
The BASR 15,0 before the SAVE macro for the function is required because a SAVE macro that specifies an identifier
requires that register 15 has the address of the macro. Yes, I know there are “tricks” that bypass the requirement but I seldom bother. Some programmers that use this method apply careful methods to avoid overflowing the save area stack, but since I started using it I have never once overflowed the save area stack, though I think I've gone 4 levels down from time to time.
In the SAVE macro you specify registers to save. You only need to save and restore the registers the actual routine itself modifies. Any subroutine you call is responsible for the registers it uses. The STM and LM instructions in the SAVE and RETURN macros are the slowest part of this scheme; any register you do not have to save improves the performance. I have a function I regularly use that just saves register 14, calls an external function and then calls an internal function. The external function uses every register, the internal function uses several registers, but by rigorously following the SAVE and RETURN convention everyone is happy. You will find it's rarely necessary to save and restore registers 15, 0 and 1; most programmers assume they are trashed by the call. My example does not use recursion, though in the 1970s and 1980s I used essentially this scheme to implement a function stack that required recursion.
As you have noted, there are no PUSH and POP instructions. However, there is the BAKR instruction and the PC and PR instructions, which evidently you missed. I do not use them; they are
extremely slow. The engineers that defined the System/360 architecture were very aware of architectures such as the B5500 that implemented the stack concept but had essentially no registers and thought using real registers rather than s-l-o-w storage for pseudo registers in a stack would yield better overall performance. The trade off is stacking concepts can mean faster context switching, say for subroutine calls or interrupts than with System/360 type architectures. In the 1980s I inherited a program that implemented PUSH and POP by macros, and grew to thoroughly detest the whole scheme.