And now for 3 ways to set an internal pullup

April 23, 2024

This article is part of a series.

Related Repo: https://github.com/carlynorama/StrippedDownChipRosetta/blob/main/ARM/SAMD21E18/04_AssemblySwtich/

Slowly working towards parity with the AVR example, this post adds the switch with an internal pullup resistor. One of the nice things about internal pullups? The part of a switch can comfortably be played by some 22 AWG yellow wire from the Dish of Scrap Miscellany.

Animated GIF of the LED being switched on and off

Circuit Warning

This circuit doesn’t match the AVR circuit. The AVR Circuit sank the LED and pulled up the switch, following classic recommend practice. This circuit:

different: sources the LED, because that’s the Arduino built-in LED convention. (Off==On is kinda hard for intro classes)
same: pulls up the switch

Since the switch and led don’t match electrically, the code will handle the inversion to get the same result as the AVR circuit. (On when closed, off when open)

circuit diagram of LED and switch hooked up to trinket. LED is the built in LED (PA10) and the switch is hooked up to D3 (PA07) and GND

“Bit Hacking”

Checking a switch and conditionally turning on an LED requires some fancier moves than setting an LED alone. I made an emulator script with some examples. The two-step instructions like MOV R1, R0, LSL #6 need to be broken up and MOV changed to MOVS on the actual chip (MOV is reserved for specific registers on CM0+), but the concepts translate.

References

YouTube: 3: Bit Hacks (8:22) as part of MIT 6.172 Performance Engineering of Software Systems, Fall 2018 by MIT OpenCourseWare
Bit Twiddling Hacks by Sean Eron Anderson at Stanford Computer Graphics Laboratory
Knuth TAOCP, 4A section 7.1.3

Example Code

emulator link

.global _start
_start:

  //-------------------------------------------------
  //Adding new 1 bits
  
  MOV R0, #1
  ORR R0, R0, LSL #6 //R0 will now have 1 bit at 0 and 5
  
  //-------------------------------------------------
  //Using TST
  //TST compares two values with an AND, discarding result, 
  //but sets zero flag if the result of the operation was 0
  //So if the two values have ANY bits the same ZF == 0
  
  //R0 is 0x41
  TST R0, #1  //result of AND is 1, ZF == 0
  TST R0, #0x40 //result of AND is 0x40, ZF == 0
  
  MOV R1, #0
  TST R1, #1  //result of AND is 0, ZF == 1
  TST R1, #64 //result of AND is 0, ZF == 1
  
  MOV R1, R0, LSL #6 //R1 will have a bit at 5 and 11 
  TST R1, R0 //have bits in common, ZF 0
  
  MOV R0, R1
  TST R1, R0
  
  //EQ and NE are named for the CMP results (which does a SUB)
  //So results confusing. 
  //BEQ to branch when ZF == 1  //has no things in common
  //BNE to branch when ZF == 0  //has some things in common
  
  //-------------------------------------------------
  //Set specific bits to zero
  LDR R0, =0xAAAAAAAA
  MOV R1, #10  //put an a in the 0 position
  LSLS R1, #12 //move the a to the 3rd position (4*3)
  
  //Performs XOR operation
  //on bits of Rn with bits of Rm.
  EORS R2, R0, R1
  
  //Performs AND operation
  //on bits of Rn with ~bits of Rm.
  BICS R3, R0, R1
  
  //Looks the same? Not quite!
  
  //0xA == 0b1010, 0x2 = 0b0010
  LDR R4, =0x22222222
  ORR R5, R4, R1   //Adds missing bits => 0b1010
  EORS R5, R4, R1  //removes where duplicated => 0b1000
  BICS R5, R4, R1  //removes where mask (R1) is 1 => 0b0000
  
  //-------------------------------------------------
  //Retrieve selective chunk with mask
  MOV R0, #0xFF
  LSL R0, #16 //(4*N)
  AND R5, R4, R0
  LSR R5, #16  //if need it down in lowest part

    //-------------------------------------------------
  //Isolate selective chunk no mask
  LDR R0, =#0x22FF2222
  LSL R0, #8 //(4*N)
  LSR R0, 24
  
  //-------------------------------------------------
  //Set selective chunk
  
    LSL R0, 24 //mask to clear target area
  BICS R4, R0 //clean out the bit area in target
    MOV R1, #0xAA  //put desired value in a register
  ORR R4, R1, LSL #16 //orr clean target with shifted new value

Conditional Branching with the Thumb-1 Instructions Set

What’s another thing you won’t see in any of the code for the SAMD21E18A? Most of the compound conditional instructions. (cc place holder for conditional code)

BLcc (Bcc okay though)
MOVcc Rd Rm
IT cc…

Cortex-M0/Cortex-M0+ chips use Arm v6, which means the thumb v1 instruction set, which means NONE of the conditional niceties available to their bigger siblings. I’m not used to assembly having them anyway, but much of the example code out there seems to target M3 or higher. It just won’t compile with the -mcpu=cortex-m0plus flag set.

References

See also the links in the above ¶
armv6 vs armv7 overview from Arm community
Instruction Set Assembly Guide for Armv7 and earlier Arm architectures from Arm documentation service
Arm Cortex-M0+ DataSheet from Arm contains image below. Figure 5, page 6.

Image that compares what assembly instructions that are available accross the Coretex-M family. From the Cortex-M0+ data sheet. Figure 5, page 6

Example Code

What does that difference look like in practice? In full Arm syntax, BLNE (branch with link if Z!=0) can be called with no fuss.

//full syntax
//.syntax unified
//.arm //might also see as .code 32
  TST R0, R4
  BLNE doIfAnySwitch
  //continue...

In thumb2 BLNE can be called inside an IT block.

//on armv7, thumb2 capable chips
//.syntax unified
//.thumb //might also see as .code 16
  TST R0, R4
  IT NE
  BLNE doIfAnySwitch
  //continue...

In thumb1 BLNE doesn’t exist at all but BEQ and BNE do. The best call will be to look for the negative branch condition instead and jump past the branch instead.

//------ thumb1 style only
//on armv6, thumb 1 only
//.syntax divided 
//.thumb //might also see as .code 16
  TST R0, R4
  BEQ skipBranch
  BL doIfAnySwitch
  skipBranch:
  //continue...

So if you’re wondering why my switch code doesn’t use any of the fancy branching syntax you’ve seen in other places, that’s why.

Calling Conventions

It’s finally time to touch on proper sub routine calling conventions. The TL;DR

Don’t trust R0-R3 to stay the way you left them (“callee clobbered”)
Put R4-R11 and the stack pointer (“SP”) need to be put back the way th callee found them (“callee preserved”).
R12-R15 the system can alter at any time
Subroutines with subroutines have to deal with the original LR being “clobbered”, but can handle that by putting their desired “LR” back into “PC” themselves.

It’s up to the programmer to play nice with expectations, and each manufacturer has slight variations on top of the Arm standard to be aware of.

Register Conventions for SAM MCUs

Register	Label	Role
R0		argument/result/scratch register 1
R1		argument/result/scratch register 2
R1		argument/scratch register 3
R3		argument/scratch register 4
R4		variable register 1
R5		variable register 2
R6		variable register 3
R7		variable register 4
R8		variable register 5*
R9		variable register 6*
R10		variable register 7*
R11		variable register 8*
R12	IP	Intra-procedure call scratch register*‡
R13	SP	Stack pointer
R14	LR	Link Register
R15	PC	Program Counter

* In the M0 and M0+ spec, registers above R7 have limitations. Only R0-R7, the low registers, can uniformly be accessed by all instructions. The higher registers can only be accessed by add, blx, bx, cmp, mov (not movs), msr, mrs. This is indicated in the Cortex-M0+ Technical Reference Manual (Revision: r0p1) section 3.3 as some functions only being able to use with “Lo” (R0-R7) vs “Any”, See also A4.1 in the ARMv6-M Architecture Reference Manual for a brief mention

References

StackOverflow discussion on what registers to save for Arm
Microchip Register Conventions Guide for Pic32C/SAM MCUS
Procedure Call Standard for the ARM® Architecture v2.08 (Older doc, linked to from UMich EECS 373 website.)
Example of push/pop in the wild from FFMPEG code base
Relationship to C function calls from TI
“The Rust Calling Convention We Deserve” mcyoung

Code Example

It doesn’t do the full “prologue” and “epilogue” one might see in complicated programs, but it’s a start.

emulator

.syntax unified
.global _start
_start:
  
  MOV R4,  #0xAA
  MOV R5,  #0xAB
  MOV R6,  #0xAC
  MOV R7,  #0xAD //NOTE: regs above 7 have limits in
  MOV R8,  #0xAE //CM0/CM0+ chips. can only use with
  MOV R9,  #0xAF //add, blx, bx, cmp, mov, msr, mrs
  MOV R10, #0xBA 
  MOV R11, #0xBC
  
  loop:
    MOV R0, #1
    MOV R1, #2
    MOV R2, #3
    MOV R3, #4

    BL subCallWithSubCall
    //R4-R11 will look exactly the same
    //R0-R3 will be entirely different
    //R0 will have possible return value
    //sp is back to zero
  B loop
  
subCallWithSubCall:
  PUSH {R4-R11, LR}
  PUSH {R0-R3} //move R0-R3 
  POP {R4-R7}  //into R4-R7

  ADD R1, R5, R6
  ADD R2, R7, R8
  ADD R3, R8, R9

  PUSH {R1-R3} //save my calculations
  BL leafCall
  POP {R1-R3} //retrieve them.
  //R0 has "result of leafCall"

  CMP R0, R10
  MOVLT R0, R10 //wouldn't be possible in thumb1
  //R0 has result of this routine
POP {R4-R11, PC} //no BX LR b/c return was handled by 
         //direct load of saved LR into PC

leafCall:
  PUSH {R4-R11} //J.I.C. would remove really. 

  ADD R1, R5, R6
  ADD R2, R7, R5
  ADD R0, R1, R2

  POP {R4-R11}
BX LR

Setting pullups

Using the data gleaned in the last post, I wrote three different versions of setting the pullup resistor for my switch on PA07.

V1 - inline, hard coded

I gave this code a label so I could set a break point, but it lives inline before the loop as you can see in the repo (link below).

The biggest gotcha was picking the address to put in switchPinCNFGOffset because to use LDR that address has to be divisible by 4.

So instead of portA_PINCNFG+0x07, it’s portA_PINCNFG+0x04

I show loading a specific byte in the later examples, but I thought this was a good lesson. To use word aligned data I fetched the nearest but lower aligned location and shifted the needed bits up to the last byte to match the 7 total offset.

in repo

setPullup:
  //---- For using internal pullup only
  //switchPinCNFGOffset == portA_PINCNFG+0x04 
 LDR R5, =switchPinCNFGOffset //pinConfig closest word location
  LDR R0, [R5] //load current settings into R0
  MOVS R1, #6 //create value for INEN 1 (bit 1) //set PULLEN 1 (bit 2)
  LSLS R1, #24 //(8*(7-4)) //move it from 4 to 7
  ORRS R0, R1 //apply mask
  STR R0, [R5] //put the updated word back into the config.
  LDR R5, =portA_OUTSET
  STR R4, [R5] //set the out of the switch high
  //--- END setting internal pullup

V2 - one pin per branch

This code takes the pin number in R0 and sets the config for that pin. I don’t bother pushing R4-R7 because I carefully don’t touch them.

in repo

//Set R0 to contain the pin number
//function uses R0-R3 
.word setPullup
.thumb_func
setPullup:
  //Get base address
  LDR R1, =portA_PINCNFG
  //LDRB will allow byte call
  LDRB R2, [R1, R0] // CFG base, with immediate offset (R0==pin #)

  MOVS R3, #6 // INEN 1 (bit 1) //set PULLEN 1 (bit 2)
  ORRS R2, R3 // update retrieved settings with new settings

  //Must save with STRB
  STRB R2, [R1, R0] 
  //Make pull UP by setting high 
  LDR R1, =portA_OUTSET
  MOVS R2, #1
  LSLS R2, R0 //move the bit to the pin pos
  STR R2, [R1] //set the out of the switch pin high
  BX lr

V3 - batch set

This code uses the WRCONFIG address to batch set multiple pins. See page 383, Section 23.8.11 in the Family Datasheet. I don’t need it for this code, but since they provide the address I wanted to show it!

Using the WRCONFIG address can set up to 16 pins in the same half of a pin group that should share the same configuration all the same time.

Screenshot of bit layout of WRCONFIG from page 371 of the family datasheet (23.7 Register Summary)

Bit	Name	Acronym	Usage

0-15	Pin mask	PINMASK	1: set values for this pin, 0: ignore
16	Peripheral Multiplexing Enable	PMUXEN	1: enabled , 0: disabled (0 in PINCFG)
17	Input Enable	INEN	1: pin is input, 0: pin is output (1 in PINCFG)
18	Pull Enable	PULLEN	1: pull enabled , 0: pull disabled (2 in PINCFG)
19	—	—	No setting in SAMD21
20	—	—	No setting in SAMD21
21	—	—	No setting in SAMD21
22	Output Driver Strength Selection	DRVSTR	0 normal 1 strong (6 in PINCFG)
23	—	—	No setting in SAMD21
24-27	Peripheral Multiplexing	PMUX	see 23.8.12 and Sec. 7
28	-N–	WRPMUX	Should update PMUX (1, yes, 0 will ignore)
29	—	—	No setting in SAMD21
30	Write PINCFG	WRPINCNFG	Should update PINCFGs (1: yes, 0: will ignore)
31	Half-Word Select	HWSEL	Indicates if 0-15 are the lower (0) or higher (1) pins

This code takes in the whole 32 pin group and splits it into the necessary two batches.

Again this code carefully avoids spilling over past R3. It doesn’t use a conditional linked branch to a sub routine that does the repeated tasks of pairing a setting with a half-pin-group and loading it to the register. Little austerities that probably wouldn’t make sense to a functional programming mindset without knowing the thumb1 assembly context.

mock version in emulator
in repo

.word multiPinPullup
.thumb_func
multiPinPullup:
  //Go ahead and set the ups
  LDR R1, =portA_OUTSET
  STR R0, [R1]

  MOVS R2, #1
  MOVS R3, #2
  RORS R2, R3  // Bit 30 -> 1 to enable PINCFG set
  MOVS R3, #6  // INEN 1 (bit 1) //set PULLEN 1 (bit 2)
  LSLS R3, #16 // move up to 3rd byte
  ORRS R3, R2  // Add bit 30 to R3

  LDR R2, =#0xFFFF
  TST  R0, R2
  BEQ  mpp_upperHalf
  MOVS R1, R0  // copy R0 for editing
  AND R1, R2   //isolate bottom half
  ORRS R1, R3  // update R1 with settings 
  LDR R2, =portA_WRCONFIG
  STR R1, [R2] // send WRCONFIG bottom half info

  mpp_upperHalf:
  LDR R1, =#0xFFFF0000
  TST R0, R1
  BEQ mpp_exit
  LSRS R0, #16 // isolate top half
  ORRS R0, R3  // add settings
  LSLS R1, #15 // make R1 0x80000000
  ORRS R0, R1  // Add top half flag in bit 31
  LDR R2, =portA_WRCONFIG //might be this value already. 
  STR R0, [R2] // send WRCONFIG top half info
  mpp_exit:
  BX LR

Reacting to the Switch

I wrote two ways to react to the switch values. The first requires knowing the distance between the two pins on the PA register, but doesn’t depend on branching. The second uses a TST and BEQ to send the ledMask to OUTCLR or OUTSET.

V1 - Use the Switch Pin Directly

This code never “knows” what the state of the switch pin is. It sends whatever that state is to the LED’s OUTCLR position and it’s inverse to OUTSET every time. (see hardware note)

in repo

  MOVS R4, #1  
  LSLS R4, #switchPinOffset
  LDR R5, =portA_OUTSET
  LDR R6, =portA_OUTCLR
  LDR R7, =portA_IN

  loop:
    //get the value of the IN register (address in R7)
    LDR R0, [R7]

    //Isolate the switch reading 
    ANDS R0, R4 //should typically be 1, due to pullup resistor
    MOVS R1, R0
    EORS R1, R4 //R1 will 1 at the switch bit iff the read was 0

    LSLS R0, #3  //move bit 7 (switch) to 10 (led)
    LSLS R1, #3  //move bit 7 (switch) to 10 (led)

    //Sending 0s the to set and clear registers has no effect.
    //The state of R0 always matches what should go to CLEAR
    //The state of R1 matches what should be in SET 
    STR R0, [R6]
    STR R1, [R5]
  B loop

V2 - Test the Switch Pin

Uses separate masks for the led pin and switch pins, but requires navigating conditionals in the thumb 1 instruction set!

in repo

  MOVS R4, #1
  LSLS R3, R4, #ledPinOffset 
  LSLS R4, #switchPinOffset
  LDR R5, =portA_OUTSET
  LDR R6, =portA_OUTCLR
  LDR R7, =portA_IN

  loop:
    //get the value of the IN register
    LDR R0, [R7]
        //TST the reading against the mask. 
                //R0 should read 1 at switch location as default
                //due to pullup resistor so if the AND result
    TST R0, R4  //EQUALS 0 (no match), the switch is _closed_ 
    BEQ ledOn   //so jump to on

  ledOff:  //default path
    STR R3, [R6] //R3==ledMask, R6==portA_OUTCLR
    B loop

    ledOn:
    STR R3, [R5] //R3==ledMask, R5==portA_OUTSET
    B loop

Complete main.s examples

Version 1

.syntax unified
.cpu cortex-m0
.fpu softvfp 
.thumb  

.section .text.program_code

.equ portA_address, 0x41004400
.equ portA_DIRCLR, 0x41004400+0x04
.equ portA_DIRSET, 0x41004400+0x08
.equ portA_OUTCLR, 0x41004400+0x14
.equ portA_OUTSET, 0x41004400+0x18
//.equ portA_OUTTGL, 0x41004400+0x1C
.equ portA_IN, 0x41004400+0x20
.equ portA_PINCNFG, 0x41004400+0x40



.equ ledPinOffset, 10    //PA10
.equ switchPinOffset, 7  //PA07, 
//this value is NOT 7 because use LDR below
//which requires a word alignment
.equ switchPinCNFGOffset, portA_PINCNFG+0x04 
//NOTE: a CLOSED (true) switch will read LOW

.thumb_func
.global _start
_start:

	MOVS R3, #1
	LSLS R3, #ledPinOffset  //R3 is now LED mask (1 at led pos)
  MOVS R4, #1  
	LSLS R4, #switchPinOffset //R4 is now switch mask (1 at switch pos)

  //The DIRSET should have one for every LED and 0 for everys switch.
  //The 0 is the default, but an explicit set will be safe. 

  LDR R5, =portA_DIRSET
  STR R3, [R5]   //move the 1s in for the LEDs
  LDR R5, =portA_DIRCLR
  STR R4, [R5]   //clear all the switches

setPullup:
  //---- For using internal pullup only
  LDR R5, =switchPinCNFGOffset //pinConfig closest word location
  LDR R0, [R5] //load current settings into R0
  MOVS R1, #6 //create value for INEN 1 (bit 1) //set PULLEN 1 (bit 2)
  LSLS R1, #24 //(8*(7-4)) //move it from 4 to 7
  ORRS R0, R1 //apply mask
  STR R0, [R5] //put the updated word back into the config.
  LDR R5, =portA_OUTSET
  STR R4, [R5] //set the out of the switch high
  //--- END setting internal pullup

  //if remove pullup code
  //set R5 to hold portA_OUTSET here
  LDR R6, =portA_OUTCLR
  LDR R7, =portA_IN

  loop:
    //get the value of the IN register
    LDR R0, [R7]

    //OPTION 1
    //Isolate the switch reading 
    ANDS R0, R4 //should typically be 1, due to pullup resistor
    MOVS R1, R0
    EORS R1, R4 //R1 will 1 at the switch bit iff the read was 0

    LSLS R0, #3  //move bit 7 (switch) to 10 (led)
    LSLS R1, #3  //move bit 7 (switch) to 10 (led)

    //Sending 0s the to set and clear registers has no effect.
    //The state of R0 always matches what should go to CLEAR
    //The state of R1 matches what should be in SET 
    STR R0, [R6]
    STR R1, [R5]
  B loop

Version 2

.syntax unified
.cpu cortex-m0
.fpu softvfp 
.thumb  

.section .text.program_code

.equ portA_address, 0x41004400
.equ portA_DIRCLR, 0x41004400+0x04
.equ portA_DIRSET, 0x41004400+0x08
.equ portA_OUTCLR, 0x41004400+0x14
.equ portA_OUTSET, 0x41004400+0x18
//.equ portA_OUTTGL, 0x41004400+0x1C
.equ portA_IN, 0x41004400+0x20
.equ portA_WRCONFIG, 0x41004400+0x28
.equ portA_PINCNFG, 0x41004400+0x40

.equ ledPinOffset, 10    //PA10
.equ switchPinOffset, 7  //PA07, 
//NOTE: a CLOSED switch will read LOW

//since pull a byte later, need the raw value to be a byte


.thumb_func
.global _start
_start:


  MOVS R4, #1 
  LSLS R3, R4, #ledPinOffset //R3 is now LED mask (1 at led pos)
  LSLS R4, #switchPinOffset  //R4 is now switch mask (1 at switch pos)

  //The DIRSET should have one for every LED and 0 for everys switch.
  //The 0 is the default, but an explicit set will be safe. 

  LDR R5, =portA_DIRSET
  STR R3, [R5]   //move the 1s in for the LEDs
  LDR R5, =portA_DIRCLR
  STR R4, [R5]   //clear all the switches

  @ OPTION 1: Set 1 pin
  @ MOVS R0, #switchPinOffset  //put switch # into R0 
  @ PUSH {R0-R3} // stash in stack
  @ BL setPullup
  @ POP {R0-R3}  // restore

  @ OPTION 2: Set multiple pins
  MOVS R0, R4  //put switch mask into R0
  PUSH {R0-R3} //stash in stack
  BL multiPinPullup
  POP {R0-R3}  //restore 

  LDR R5, =portA_OUTSET
  LDR R6, =portA_OUTCLR
  LDR R7, =portA_IN

    loop:
    //get the value of the IN register
    LDR R0, [R7]
				        //TST the reading against the mask. 
                //R0 should read 1 at switch location as default
                //due to pullup resistor so if the AND result
    TST R0, R4  //EQUALS 0 (no match), the switch is _closed_ 
    BEQ ledOn   //so jump to on

	ledOff:  //default path
    STR R3, [R6] //R3==ledMask, R6==portA_OUTCLR
    B loop

    ledOn:
    STR R3, [R5] //R3==ledMask, R5==portA_OUTSET
    B loop

//end _start


//Set R0 to contain the 32 pin mask of pins that should have pullups
.word multiPinPullup
.thumb_func
multiPinPullup:
  //Go ahead and set the ups
  LDR R1, =portA_OUTSET
  STR R0, [R1]

  MOVS R2, #1
  MOVS R3, #2
  RORS R2, R3  // Bit 30 -> 1 to enable PINCFG set
  MOVS R3, #6  // INEN 1 (bit 1) //set PULLEN 1 (bit 2)
  LSLS R3, #16 // move up to 3rd byte
  ORRS R3, R2  // Add bit 30 to R3

  LDR R2, =#0xFFFF
  TST  R0, R2
  BEQ  mpp_upperHalf
  MOVS R1, R0  // copy R0 for editing
  ANDS R1, R2   //isolate bottom half
  ORRS R1, R3  // update R1 with settings 
  LDR R2, =portA_WRCONFIG
  STR R1, [R2] // send WRCONFIG bottom half info

  mpp_upperHalf:
  LDR R1, =#0xFFFF0000
  TST R0, R1
  BEQ mpp_exit
  LSRS R0, #16 // isolate top half
  ORRS R0, R3  // add settings
  LSLS R1, #15 // make R1 0x80000000
  ORRS R0, R1  // Add top half flag in bit 31
  LDR R2, =portA_WRCONFIG //might be this value already. 
  STR R0, [R2] // send WRCONFIG top half info
  mpp_exit:
  BX LR

Summary

Choosing the smallest of the Arm family makes writing the assembly extra fidly. To the rescue… switching to C and letting the compiler sort it out! That better matches the AVR example anyway. I’m going to do same GDB/makefile tidy-up in prep.