And now for 3 ways to set an internal pullup
This article is part of a series.
- Part 1: Hello LED on an AVR (ATtiny45) in C
- Part 2: How can I make programming an ARM chip as hard as possible?
- Part 3: How can I get this SAMD21E18 startup code a little sturdier?
- Part 4: It's ALIVE! (SAMD21E18A, Assembly, No SDK)
- Part 5: This Article
- Part 6: It'd make sense to do some toolchain clean up
- Part 7: Neat, switching to the ItsyBitsy just... works
- Related Repo: https://github.com/carlynorama/StrippedDownChipRosetta/blob/main/ARM/SAMD21E18/04_AssemblySwtich/
Slowly working towards parity with the AVR example, this post adds the switch with an internal pullup resistor. One of the nice things about internal pullups? The part of a switch can comfortably be played by some 22 AWG yellow wire from the Dish of Scrap Miscellany.
Circuit Warning
This circuit doesn’t match the AVR circuit. The AVR Circuit sank the LED and pulled up the switch, following classic recommend practice. This circuit:
- different: sources the LED, because that’s the Arduino built-in LED convention. (Off==On is kinda hard for intro classes)
- same: pulls up the switch
Since the switch and led don’t match electrically, the code will handle the inversion to get the same result as the AVR circuit. (On when closed, off when open)
“Bit Hacking”
Checking a switch and conditionally turning on an LED requires some fancier moves than setting an LED alone. I made an emulator script with some examples. The two-step instructions like MOV R1, R0, LSL #6
need to be broken up and MOV
changed to MOVS
on the actual chip (MOV
is reserved for specific registers on CM0+), but the concepts translate.
References
- YouTube: 3: Bit Hacks (8:22) as part of MIT 6.172 Performance Engineering of Software Systems, Fall 2018 by MIT OpenCourseWare
- Bit Twiddling Hacks by Sean Eron Anderson at Stanford Computer Graphics Laboratory
- Knuth TAOCP, 4A section 7.1.3
Example Code
.global _start
_start:
//-------------------------------------------------
//Adding new 1 bits
MOV R0, #1
ORR R0, R0, LSL #6 //R0 will now have 1 bit at 0 and 5
//-------------------------------------------------
//Using TST
//TST compares two values with an AND, discarding result,
//but sets zero flag if the result of the operation was 0
//So if the two values have ANY bits the same ZF == 0
//R0 is 0x41
TST R0, #1 //result of AND is 1, ZF == 0
TST R0, #0x40 //result of AND is 0x40, ZF == 0
MOV R1, #0
TST R1, #1 //result of AND is 0, ZF == 1
TST R1, #64 //result of AND is 0, ZF == 1
MOV R1, R0, LSL #6 //R1 will have a bit at 5 and 11
TST R1, R0 //have bits in common, ZF 0
MOV R0, R1
TST R1, R0
//EQ and NE are named for the CMP results (which does a SUB)
//So results confusing.
//BEQ to branch when ZF == 1 //has no things in common
//BNE to branch when ZF == 0 //has some things in common
//-------------------------------------------------
//Set specific bits to zero
LDR R0, =0xAAAAAAAA
MOV R1, #10 //put an a in the 0 position
LSLS R1, #12 //move the a to the 3rd position (4*3)
//Performs XOR operation
//on bits of Rn with bits of Rm.
EORS R2, R0, R1
//Performs AND operation
//on bits of Rn with ~bits of Rm.
BICS R3, R0, R1
//Looks the same? Not quite!
//0xA == 0b1010, 0x2 = 0b0010
LDR R4, =0x22222222
ORR R5, R4, R1 //Adds missing bits => 0b1010
EORS R5, R4, R1 //removes where duplicated => 0b1000
BICS R5, R4, R1 //removes where mask (R1) is 1 => 0b0000
//-------------------------------------------------
//Retrieve selective chunk with mask
MOV R0, #0xFF
LSL R0, #16 //(4*N)
AND R5, R4, R0
LSR R5, #16 //if need it down in lowest part
//-------------------------------------------------
//Isolate selective chunk no mask
LDR R0, =#0x22FF2222
LSL R0, #8 //(4*N)
LSR R0, 24
//-------------------------------------------------
//Set selective chunk
LSL R0, 24 //mask to clear target area
BICS R4, R0 //clean out the bit area in target
MOV R1, #0xAA //put desired value in a register
ORR R4, R1, LSL #16 //orr clean target with shifted new value
Conditional Branching with the Thumb-1 Instructions Set
What’s another thing you won’t see in any of the code for the SAMD21E18A? Most of the compound conditional instructions. (cc place holder for conditional code)
- BLcc
- MOVcc Rd Rm
- IT cc…
Cortex-M0/Cortex-M0+ chips use Arm v6, which means the thumb v1 instruction set, which means NONE of the conditional niceties available to their bigger siblings. I’m not used to assembly having them anyway, but much of the example code out there seems to target M3 or higher. It just won’t compile with the -mcpu=cortex-m0plus
flag set.
References
- See also the links in the above ¶
- armv6 vs armv7 overview from Arm community
- Instruction Set Assembly Guide for Armv7 and earlier Arm architectures from Arm documentation service
- Arm Cortex-M0+ DataSheet from Arm contains image below. Figure 5, page 6.
Example Code
What does that difference look like in practice? In full Arm syntax, BLNE
(branch with link if Z!=0) can be called with no fuss.
//full syntax
//.syntax unified
//.arm //might also see as .code 32
TST R0, R4
BLNE doIfAnySwitch
//continue...
In thumb2 BLNE
can be called inside an IT block.
//on armv7, thumb2 capable chips
//.syntax unified
//.thumb //might also see as .code 16
TST R0, R4
IT NE
BLNE doIfAnySwitch
//continue...
In thumb1 BLNE
doesn’t exist at all but BEQ
and BNE
do. The best call will be to look for the negative branch condition instead and jump past the branch instead.
//------ thumb1 style only
//on armv6, thumb 1 only
//.syntax divided
//.thumb //might also see as .code 16
TST R0, R4
BEQ skipBranch
BL doIfAnySwitch
skipBranch:
//continue...
So if you’re wondering why my switch code doesn’t use any of the fancy branching syntax you’ve seen in other places, that’s why.
Calling Conventions
It’s finally time to touch on proper sub routine calling conventions. The TL;DR
- Don’t trust R0-R3 to stay the way you left them (“callee clobbered”)
- Put R4-R11 and the stack pointer (“SP”) need to be put back the way th callee found them (“callee preserved”).
- R12-R15 the system can alter at any time
- Subroutines with subroutines have to deal with the original LR being “clobbered”, but can handle that by putting their desired “LR” back into “PC” themselves.
It’s up to the programmer to play nice with expectations, and each manufacturer has slight variations on top of the Arm standard to be aware of.
Register Conventions for SAM MCUs
Register | Label | Role |
---|---|---|
R0 | argument/result/scratch register 1 | |
R1 | argument/result/scratch register 2 | |
R1 | argument/scratch register 3 | |
R3 | argument/scratch register 4 | |
R4 | variable register 1 | |
R5 | variable register 2 | |
R6 | variable register 3 | |
R7 | variable register 4 | |
R8 | variable register 5* | |
R9 | variable register 6* | |
R10 | variable register 7* | |
R11 | variable register 8* | |
R12 | IP | Intra-procedure call scratch register*‡ |
R13 | SP | Stack pointer |
R14 | LR | Link Register |
R15 | PC | Program Counter |
- * In the M0 and M0+ spec, registers above R7 have limitations. Only R0-R7, the low registers, can uniformly be accessed by all instructions. The higher registers can only be accessed by add, blx, bx, cmp, mov (not movs), msr, mrs. This is indicated in the Cortex-M0+ Technical Reference Manual (Revision: r0p1) section 3.3 as some functions only being able to use with “Lo” (R0-R7) vs “Any”, See also A4.1 in the ARMv6-M Architecture Reference Manual for a brief mention
References
- StackOverflow discussion on what registers to save for Arm
- Microchip Register Conventions Guide for Pic32C/SAM MCUS
- Procedure Call Standard for the ARM® Architecture v2.08 (Older doc, linked to from UMich EECS 373 website.)
- Example of push/pop in the wild from FFMPEG code base
- Relationship to C function calls from TI
- “The Rust Calling Convention We Deserve” mcyoung
Code Example
It doesn’t do the full “prologue” and “epilogue” one might see in complicated programs, but it’s a start.
.syntax unified
.global _start
_start:
MOV R4, #0xAA
MOV R5, #0xAB
MOV R6, #0xAC
MOV R7, #0xAD //NOTE: regs above 7 have limits in
MOV R8, #0xAE //CM0/CM0+ chips. can only use with
MOV R9, #0xAF //add, blx, bx, cmp, mov, msr, mrs
MOV R10, #0xBA
MOV R11, #0xBC
loop:
MOV R0, #1
MOV R1, #2
MOV R2, #3
MOV R3, #4
BL subCallWithSubCall
//R4-R11 will look exactly the same
//R0-R3 will be entirely different
//R0 will have possible return value
//sp is back to zero
B loop
subCallWithSubCall:
PUSH {R4-R11, LR}
PUSH {R0-R3} //move R0-R3
POP {R4-R7} //into R4-R7
ADD R1, R5, R6
ADD R2, R7, R8
ADD R3, R8, R9
PUSH {R1-R3} //save my calculations
BL leafCall
POP {R1-R3} //retrieve them.
//R0 has "result of leafCall"
CMP R0, R10
MOVLT R0, R10 //wouldn't be possible in thumb1
//R0 has result of this routine
POP {R4-R11, PC} //no BX LR b/c return was handled by
//direct load of saved LR into PC
leafCall:
PUSH {R4-R11} //J.I.C. would remove really.
ADD R1, R5, R6
ADD R2, R7, R5
ADD R0, R1, R2
POP {R4-R11}
BX LR
Setting pullups
Using the data gleaned in the last post, I wrote three different versions of setting the pullup resistor for my switch on PA07.
V1 - inline, hard coded
I gave this code a label so I could set a break point, but it lives inline before the loop as you can see in the repo (link below).
The biggest gotcha was picking the address to put in switchPinCNFGOffset
because to use LDR
that address has to be divisible by 4.
So instead of portA_PINCNFG+0x07
, it’s portA_PINCNFG+0x04
I show loading a specific byte in the later examples, but I thought this was a good lesson. To use word aligned data I fetched the nearest but lower aligned location and shifted the needed bits up to the last byte to match the 7 total offset.
setPullup:
//---- For using internal pullup only
//switchPinCNFGOffset == portA_PINCNFG+0x04
LDR R5, =switchPinCNFGOffset //pinConfig closest word location
LDR R0, [R5] //load current settings into R0
MOVS R1, #6 //create value for INEN 1 (bit 1) //set PULLEN 1 (bit 2)
LSLS R1, #24 //(8*(7-4)) //move it from 4 to 7
ORRS R0, R1 //apply mask
STR R0, [R5] //put the updated word back into the config.
LDR R5, =portA_OUTSET
STR R4, [R5] //set the out of the switch high
//--- END setting internal pullup
V2 - one pin per branch
This code takes the pin number in R0 and sets the config for that pin. I don’t bother pushing R4-R7 because I carefully don’t touch them.
//Set R0 to contain the pin number
//function uses R0-R3
.word setPullup
.thumb_func
setPullup:
//Get base address
LDR R1, =portA_PINCNFG
//LDRB will allow byte call
LDRB R2, [R1, R0] // CFG base, with immediate offset (R0==pin #)
MOVS R3, #6 // INEN 1 (bit 1) //set PULLEN 1 (bit 2)
ORRS R2, R3 // update retrieved settings with new settings
//Must save with STRB
STRB R2, [R1, R0]
//Make pull UP by setting high
LDR R1, =portA_OUTSET
MOVS R2, #1
LSLS R2, R0 //move the bit to the pin pos
STR R2, [R1] //set the out of the switch pin high
BX lr
V3 - batch set
This code uses the WRCONFIG
address to batch set multiple pins. See page 383, Section 23.8.11 in the Family Datasheet. I don’t need it for this code, but since they provide the address I wanted to show it!
Using the WRCONFIG address can set up to 16 pins in the same half of a pin group that should share the same configuration all the same time.
Bit | Name | Acronym | Usage |
---|---|---|---|
0-15 | Pin mask | PINMASK | 1: set values for this pin, 0: ignore |
16 | Peripheral Multiplexing Enable | PMUXEN | 1: enabled , 0: disabled (0 in PINCFG) |
17 | Input Enable | INEN | 1: pin is input, 0: pin is output (1 in PINCFG) |
18 | Pull Enable | PULLEN | 1: pull enabled , 0: pull disabled (2 in PINCFG) |
19 | — | — | No setting in SAMD21 |
20 | — | — | No setting in SAMD21 |
21 | — | — | No setting in SAMD21 |
22 | Output Driver Strength Selection | DRVSTR | 0 normal 1 strong (6 in PINCFG) |
23 | — | — | No setting in SAMD21 |
24-27 | Peripheral Multiplexing | PMUX | see 23.8.12 and Sec. 7 |
28 | -N– | WRPMUX | Should update PMUX (1, yes, 0 will ignore) |
29 | — | — | No setting in SAMD21 |
30 | Write PINCFG | WRPINCNFG | Should update PINCFGs (1: yes, 0: will ignore) |
31 | Half-Word Select | HWSEL | Indicates if 0-15 are the lower (0) or higher (1) pins |
This code takes in the whole 32 pin group and splits it into the necessary two batches.
Again this code carefully avoids spilling over past R3. It doesn’t use a conditional linked branch to a sub routine that does the repeated tasks of pairing a setting with a half-pin-group and loading it to the register. Little austerities that probably wouldn’t make sense to a functional programming mindset without knowing the thumb1 assembly context.
mock version in emulator
in repo
.word multiPinPullup
.thumb_func
multiPinPullup:
//Go ahead and set the ups
LDR R1, =portA_OUTSET
STR R0, [R1]
MOVS R2, #1
MOVS R3, #2
RORS R2, R3 // Bit 30 -> 1 to enable PINCFG set
MOVS R3, #6 // INEN 1 (bit 1) //set PULLEN 1 (bit 2)
LSLS R3, #16 // move up to 3rd byte
ORRS R3, R2 // Add bit 30 to R3
LDR R2, =#0xFFFF
TST R0, R2
BEQ mpp_upperHalf
MOVS R1, R0 // copy R0 for editing
AND R1, R2 //isolate bottom half
ORRS R1, R3 // update R1 with settings
LDR R2, =portA_WRCONFIG
STR R1, [R2] // send WRCONFIG bottom half info
mpp_upperHalf:
LDR R1, =#0xFFFF0000
TST R0, R1
BEQ mpp_exit
LSRS R0, #16 // isolate top half
ORRS R0, R3 // add settings
LSLS R1, #15 // make R1 0x80000000
ORRS R0, R1 // Add top half flag in bit 31
LDR R2, =portA_WRCONFIG //might be this value already.
STR R0, [R2] // send WRCONFIG top half info
mpp_exit:
BX LR
Reacting to the Switch
I wrote two ways to react to the switch values. The first requires knowing the distance between the two pins on the PA register, but doesn’t depend on branching. The second uses a TST and BEQ to send the ledMask to OUTCLR or OUTSET.
V1 - Use the Switch Pin Directly
This code never “knows” what the state of the switch pin is. It sends whatever that state is to the LED’s OUTCLR position and it’s inverse to OUTSET every time. (see hardware note)
MOVS R4, #1
LSLS R4, #switchPinOffset
LDR R5, =portA_OUTSET
LDR R6, =portA_OUTCLR
LDR R7, =portA_IN
loop:
//get the value of the IN register (address in R7)
LDR R0, [R7]
//Isolate the switch reading
ANDS R0, R4 //should typically be 1, due to pullup resistor
MOVS R1, R0
EORS R1, R4 //R1 will 1 at the switch bit iff the read was 0
LSLS R0, #3 //move bit 7 (switch) to 10 (led)
LSLS R1, #3 //move bit 7 (switch) to 10 (led)
//Sending 0s the to set and clear registers has no effect.
//The state of R0 always matches what should go to CLEAR
//The state of R1 matches what should be in SET
STR R0, [R6]
STR R1, [R5]
B loop
V2 - Test the Switch Pin
Uses separate masks for the led pin and switch pins, but requires navigating conditionals in the thumb 1 instruction set!
MOVS R4, #1
LSLS R3, R4, #ledPinOffset
LSLS R4, #switchPinOffset
LDR R5, =portA_OUTSET
LDR R6, =portA_OUTCLR
LDR R7, =portA_IN
loop:
//get the value of the IN register
LDR R0, [R7]
//TST the reading against the mask.
//R0 should read 1 at switch location as default
//due to pullup resistor so if the AND result
TST R0, R4 //EQUALS 0 (no match), the switch is _closed_
BEQ ledOn //so jump to on
ledOff: //default path
STR R3, [R6] //R3==ledMask, R6==portA_OUTCLR
B loop
ledOn:
STR R3, [R5] //R3==ledMask, R5==portA_OUTSET
B loop
Complete main.s examples
Version 1
.syntax unified
.cpu cortex-m0
.fpu softvfp
.thumb
.section .text.program_code
.equ portA_address, 0x41004400
.equ portA_DIRCLR, 0x41004400+0x04
.equ portA_DIRSET, 0x41004400+0x08
.equ portA_OUTCLR, 0x41004400+0x14
.equ portA_OUTSET, 0x41004400+0x18
//.equ portA_OUTTGL, 0x41004400+0x1C
.equ portA_IN, 0x41004400+0x20
.equ portA_PINCNFG, 0x41004400+0x40
.equ ledPinOffset, 10 //PA10
.equ switchPinOffset, 7 //PA07,
//this value is NOT 7 because use LDR below
//which requires a word alignment
.equ switchPinCNFGOffset, portA_PINCNFG+0x04
//NOTE: a CLOSED (true) switch will read LOW
.thumb_func
.global _start
_start:
MOVS R3, #1
LSLS R3, #ledPinOffset //R3 is now LED mask (1 at led pos)
MOVS R4, #1
LSLS R4, #switchPinOffset //R4 is now switch mask (1 at switch pos)
//The DIRSET should have one for every LED and 0 for everys switch.
//The 0 is the default, but an explicit set will be safe.
LDR R5, =portA_DIRSET
STR R3, [R5] //move the 1s in for the LEDs
LDR R5, =portA_DIRCLR
STR R4, [R5] //clear all the switches
setPullup:
//---- For using internal pullup only
LDR R5, =switchPinCNFGOffset //pinConfig closest word location
LDR R0, [R5] //load current settings into R0
MOVS R1, #6 //create value for INEN 1 (bit 1) //set PULLEN 1 (bit 2)
LSLS R1, #24 //(8*(7-4)) //move it from 4 to 7
ORRS R0, R1 //apply mask
STR R0, [R5] //put the updated word back into the config.
LDR R5, =portA_OUTSET
STR R4, [R5] //set the out of the switch high
//--- END setting internal pullup
//if remove pullup code
//set R5 to hold portA_OUTSET here
LDR R6, =portA_OUTCLR
LDR R7, =portA_IN
loop:
//get the value of the IN register
LDR R0, [R7]
//OPTION 1
//Isolate the switch reading
ANDS R0, R4 //should typically be 1, due to pullup resistor
MOVS R1, R0
EORS R1, R4 //R1 will 1 at the switch bit iff the read was 0
LSLS R0, #3 //move bit 7 (switch) to 10 (led)
LSLS R1, #3 //move bit 7 (switch) to 10 (led)
//Sending 0s the to set and clear registers has no effect.
//The state of R0 always matches what should go to CLEAR
//The state of R1 matches what should be in SET
STR R0, [R6]
STR R1, [R5]
B loop
Version 2
.syntax unified
.cpu cortex-m0
.fpu softvfp
.thumb
.section .text.program_code
.equ portA_address, 0x41004400
.equ portA_DIRCLR, 0x41004400+0x04
.equ portA_DIRSET, 0x41004400+0x08
.equ portA_OUTCLR, 0x41004400+0x14
.equ portA_OUTSET, 0x41004400+0x18
//.equ portA_OUTTGL, 0x41004400+0x1C
.equ portA_IN, 0x41004400+0x20
.equ portA_WRCONFIG, 0x41004400+0x28
.equ portA_PINCNFG, 0x41004400+0x40
.equ ledPinOffset, 10 //PA10
.equ switchPinOffset, 7 //PA07,
//NOTE: a CLOSED switch will read LOW
//since pull a byte later, need the raw value to be a byte
.thumb_func
.global _start
_start:
MOVS R4, #1
LSLS R3, R4, #ledPinOffset //R3 is now LED mask (1 at led pos)
LSLS R4, #switchPinOffset //R4 is now switch mask (1 at switch pos)
//The DIRSET should have one for every LED and 0 for everys switch.
//The 0 is the default, but an explicit set will be safe.
LDR R5, =portA_DIRSET
STR R3, [R5] //move the 1s in for the LEDs
LDR R5, =portA_DIRCLR
STR R4, [R5] //clear all the switches
@ OPTION 1: Set 1 pin
@ MOVS R0, #switchPinOffset //put switch # into R0
@ PUSH {R0-R3} // stash in stack
@ BL setPullup
@ POP {R0-R3} // restore
@ OPTION 2: Set multiple pins
MOVS R0, R4 //put switch mask into R0
PUSH {R0-R3} //stash in stack
BL multiPinPullup
POP {R0-R3} //restore
LDR R5, =portA_OUTSET
LDR R6, =portA_OUTCLR
LDR R7, =portA_IN
loop:
//get the value of the IN register
LDR R0, [R7]
//TST the reading against the mask.
//R0 should read 1 at switch location as default
//due to pullup resistor so if the AND result
TST R0, R4 //EQUALS 0 (no match), the switch is _closed_
BEQ ledOn //so jump to on
ledOff: //default path
STR R3, [R6] //R3==ledMask, R6==portA_OUTCLR
B loop
ledOn:
STR R3, [R5] //R3==ledMask, R5==portA_OUTSET
B loop
//end _start
//Set R0 to contain the 32 pin mask of pins that should have pullups
.word multiPinPullup
.thumb_func
multiPinPullup:
//Go ahead and set the ups
LDR R1, =portA_OUTSET
STR R0, [R1]
MOVS R2, #1
MOVS R3, #2
RORS R2, R3 // Bit 30 -> 1 to enable PINCFG set
MOVS R3, #6 // INEN 1 (bit 1) //set PULLEN 1 (bit 2)
LSLS R3, #16 // move up to 3rd byte
ORRS R3, R2 // Add bit 30 to R3
LDR R2, =#0xFFFF
TST R0, R2
BEQ mpp_upperHalf
MOVS R1, R0 // copy R0 for editing
ANDS R1, R2 //isolate bottom half
ORRS R1, R3 // update R1 with settings
LDR R2, =portA_WRCONFIG
STR R1, [R2] // send WRCONFIG bottom half info
mpp_upperHalf:
LDR R1, =#0xFFFF0000
TST R0, R1
BEQ mpp_exit
LSRS R0, #16 // isolate top half
ORRS R0, R3 // add settings
LSLS R1, #15 // make R1 0x80000000
ORRS R0, R1 // Add top half flag in bit 31
LDR R2, =portA_WRCONFIG //might be this value already.
STR R0, [R2] // send WRCONFIG top half info
mpp_exit:
BX LR
Summary
Choosing the smallest of the Arm family makes writing the assembly extra fidly. To the rescue… switching to C and letting the compiler sort it out! That better matches the AVR example anyway. I’m going to do same GDB/makefile tidy-up in prep.
This article is part of a series.
- Part 1: Hello LED on an AVR (ATtiny45) in C
- Part 2: How can I make programming an ARM chip as hard as possible?
- Part 3: How can I get this SAMD21E18 startup code a little sturdier?
- Part 4: It's ALIVE! (SAMD21E18A, Assembly, No SDK)
- Part 5: This Article
- Part 6: It'd make sense to do some toolchain clean up
- Part 7: Neat, switching to the ItsyBitsy just... works