STM8 Precise Cycle Delay

STM8 Precise Cycle Delay

12 Sep 2018, 08:32am TZ +05:30
STM8, hardware
Embedded

This started a quest to design the perfect cycle delay for one of my favorite MCUs the STM8. In the process I learned a lot about custom pipeline and execution of in proprietary MCU such as STM8S.

STM8 board Image
Fig: STM8S Board

After the 8051 days (1996-2000) had passed, I did not get to touch assembly language. It was always C that I programmed in for the MCUs. With the occasional tinkering in Makefiles and ld-files.

The STM8 is a family of MCUs from ST Microelectronics called STM8S. These are best low cost Debug-able MCUs. Means you can Flash Program them like normal MCUs. And, they also provide a first class debugging features. This works great for iterative firmware development. In my case a lazy firmware designer, who loves this feature. Then I don’t needed to wait for the full program. Or don’t need to add special debug prints. As I can now view memory or the variables in question via debug.

Well lets look into what we want to do here.

The idea is to create a simple loop based cycle delay. #

Yes that’s about the gist of it. Its simple at first look. But as we progress we find more - how things can be different on the C and assembly.

To begin lets get our setup:

Setup #

Hardware #

For my work, I used the cheap STM8S103F3P3 based board from Aliexpress

You can find a lot using the search term STM8 development board

Aliexpress Snap shot of the search for “STM8 development board”
Fig: Aliexpress Snap shot of the search for “STM8 development board”

I am sure you can get similar boards form Banggood and eBay.

Note my

STM8S103F3P3 board also has an LED on Port B5 in Active-High configuration. #

Compiler #

We would be using the FREE STM8 COSMIC compiler toolchain .

Cosmic STM8 Compiler
Fig: Cosmic STM8 Compiler

You can get the FREE fully functional compiler by sending out the license request to a provided email address.

Here is the official word from ST Micro on this.

IDE #

I have not found any better IDEs for STM8 than the ST Micro STVD.

STVD-STM8 the IDE for STM8 MCUs
Fig: STVD-STM8 the IDE for STM8 MCUs

It requires a few steps to setup the IDE linked to the Cosmic Compiler.

  1. In the ST Visual Develop window follow the Menu sequence to find the Options:
    Tools -> Options

  2. Then In the Options window go to the Toolset tab.

  3. Select STM8 Cosmic and in the Root path Select your installation folder where you can find the cxstm8.exe file.
    Typically it should be :
    C:\Program Files (x86)\COSMIC\FSE_Compilers\CXSTM8

  4. There may be some permission dialogs that would pop up after this. Just Accept them and press OK to complete the setting.

STVD Settings to connect it to the Cosmic Compiler
Fig: STVD Settings to connect it to the Cosmic Compiler

Idea #

Well now we are all set with the setup. Let’s work out plan of action.

We would be designing simple do-while-loop for wasting time based on cycles.

Simple Do while Loop - Microchip Developer Help
Fig: Simple Do while Loop - Microchip Developer Help
Source: http://microchipdeveloper.com/tls2101:do-while-loop

Stage 1: Simple Do-While-Loop in code #

Function delay_cycles #

1
2
3
4
5
6
void delay_cycles(uint16_t cy)
{
    do{
        --cy;
    }while(cy);
}

Main #

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include "stm8s.h"

#define LED_PORT GPIOB
#define LED_PIN GPIO_PIN_5
#define LED_BIT 5

main()
{
    // Speed up to 16MHz HSI
    CLK_HSIPrescalerConfig(CLK_PRESCALER_HSIDIV1);
    // Disable all Peripheral Clocks - at start
	CLK->PCKENR1 = 0;
	CLK->PCKENR2 = 0;
    // Configure GPIO
    GPIO_DeInit(LED_PORT);
    GPIO_Init(LED_PORT, LED_PIN, GPIO_MODE_OUT_PP_HIGH_FAST);	
    while(1)
    {
        SetBit(LED_PORT->ODR, LED_BIT);
        delay_cycles(1000);
        ClrBit(LED_PORT->ODR, LED_BIT);
        delay_cycles(1000);
    }
}

Lets look at assembly it generates specially the while loop.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
 180  0028               L56:
 181                     ; 57 		SetBit(LED_PORT->ODR, LED_BIT);
 183  0028 7214500f      	bset	20495,#2
 184                     ; 59 		delay_cycles(1000);
 186  002c ae03e8        	ldw	x,#1000
 187  002f add8          	call	_delay_cycles
 189                     ; 60 		ClrBit(LED_PORT->ODR, LED_BIT);
 191  0031 7215500f      	bres	20495,#2
 192                     ; 63 		delay_cycles(1000);
 194  0035 ae03e8        	ldw	x,#1000
 195  0038 adcf          	call	_delay_cycles
 198  003a 20ec          	jra	L56 

Lets look at the delay_cycles function assembly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
  97  0009               _delay_cycles:
  99  0009 89            	pushw	x
 100       00000000      OFST:	set	0
 103  000a               L54:
 104                     ; 26         --cy;
 106  000a 1e01          	ldw	x,(OFST+1,sp)
 107  000c 1d0001        	subw	x,#1
 108  000f 1f01          	ldw	(OFST+1,sp),x
 109                     ; 27     }while(cy);
 111  0011 1e01          	ldw	x,(OFST+1,sp)
 112  0013 26f5          	jrne	L54
 113                     ; 28 }
 116  0015 85            	popw	x
 117  0016 81            	ret

Now from the above function we can try to estimate 2 items:

  1. Loop Cycle count = Lcy
  2. Total Cycle count = Tcy

These would help us calculate the time delay it would produce.

Where to find cycle Count details:

STM8 CPU Programming Manual a.k.a PM0044 #

Here Lcy would be calculated from line #103 to #112

  1. ldw x,(OFST+1,sp) = 3
  2. subw x,#1 = 2
  3. ldw (OFST+1,sp),x = 2
  4. ldw x,(OFST+1,sp) = 2
  5. jrne L54 = 2 (With Flush)

Hence Lcy = (3+2+2+2+2) * LoopCount - 1 = 11 * LoopCount - 1

Now for Tcy calculation we need to add the additional pieces:

  1. pushw x = 2
  2. popw x = 2
  3. ret = 6
  4. ldw x,#1000 (While Calling) = 3
  5. call _delay_cycles (While Calling) = 6

Hence Tcy = Lcy + (2+2+3+6) = 13 + Lcy = 13 + 11 * LoopCount - 1

In our example LoopCount is 1000.

Lets then calculate the Actual Tcy by substituting the value.

Tcy = 13 + 11 * 1000 -1 = 11012

At the Frequency of 16MHz = 16000000Hz we have a cycle time of

t = 1/f = 1/16MHz = 6.25e-8

Hence the Total duration should be

t(delay_cycles{1000}) = Tcy * t = Tcy * 1/f = 11012 * 6.25e-8 = 6.8825e-4 Seconds

Which is actually t(delay_cycles{1000}) = 688.25 Micro Seconds

Lets now look at what the scope Says:

Scope Plot of Stage 1: Simple Do-While-Loop in code
Fig: Scope Plot of Stage 1: Simple Do-While-Loop in code

Well surprise! the delay On Time is 756.1uS and Off time is 759.8uS #

Since our LED is active High its only logical that the ON time is lesser than the OFF time.

Lets take the On time and work our way forward.

Total Deviation = 756.1uS - 688.25uS = 67.85uS

That would 1085 Cycles of difference.

And you thought directly calculating timing would work.

Well there are multiple reasons:

  1. Compiler dependency
  2. Post-Linking rearrangement
  3. you pick…

So, I could not figure out how this magic was happening. In fact I tried many times but the result was the same.

  • Reducing the libraries
  • Changing compilation options
  • trying to directly write code the function in the Main

The results kept me in the dark. There was no hope for fixing this. Probably if some one can help with this - it would be nice.

Stage 2: Inline-Assembly in code #

After some scratching head time I gave up on this simple Idea. From hindsight I new that in AVRs about delay_ms function. This function existed in the standard library libc of AVR. Looking into the Arduino/hardware/tools/avr/avr/include/util/ directory, found the delay.h file. This contained some interesting insight into how loop delays is calculated. Most of the parts were written in Assembly. That part was hard.

So I took help from Google and it brought me to: https://github.com/Hoksmur/stm8_routines

Here too I found some interesting delay code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
/* 
 * Func delayed N cycles, where N = 3 + ( ticks * 3 )
 * so, ticks = ( N - 3 ) / 3, minimum delay is 6 CLK
 * when tick = 1, because 0 equels 65535
 */
static inline void _delay_cycl( unsigned short __ticks )
{
#if defined(__CSMC__)
/* COSMIC */
  #define T_COUNT(x) (( F_CPU * x / 1000000UL )-3)/3)
	// ldw X, __ticks ; insert automaticaly
	_asm("nop\n $N:\n decw X\n jrne $L\n nop\n ", __ticks);
#elif defined(__SDCC)
  #define T_COUNT(x) (( F_CPU * x / 1000000UL )-5)/5)
	__asm__("nop\n nop\n"); 
	do { 		// ASM: ldw X, #tick; lab$: decw X; tnzw X; jrne lab$
                __ticks--;//      2c;                 1c;     2c    ; 1/2c   
        } while ( __ticks );
	__asm__("nop\n");
#elif defined(__RCST7__)
/* RAISONANCE */
  #error ToDo for RAISONANCE
#elif defined(__ICCSTM8__)
/* IAR */
  #error ToDo for IAR
#else
 #error Unsupported Compiler!          /* Compiler defines not found */
#endif
}

Since we are targeting COSMIC C complier its best we only look at the __CSMC__ option.

1
2
3
4
5
6
static inline void _delay_cycl( unsigned short __ticks )
{
  #define T_COUNT(x) (( F_CPU * x / 1000000UL )-3)/3)
	// ldw X, __ticks ; insert automaticaly
	_asm("nop\n $N:\n decw X\n jrne $L\n nop\n ", __ticks);
}

This is nice small cycle delay code. They were kind to provide MACRO T_COUNT. This MACRO can generate cycle counts from input micro-second delays.

However this function was not without its quirks. Lets look at the final code to understand what changed:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#if defined(__CSMC__)
static @inline void _delay_cycl( unsigned short __ticks )
#else
static inline void _delay_cycl( unsigned short __ticks )
#endif
{
#if defined(__CSMC__)
/* COSMIC */  
	#define T_COUNT(x) (( x * (FCLK / 1000000UL) )-3)/3)
	// ldw X, __ticks ; insert automaticaly
	_asm("nop\n $N:\n decw X\n jrne $L\n nop\n ", __ticks);  
#elif defined(__SDCC)
  #define T_COUNT(x) (( x * (FCLK / 1000000UL) )-5)/5)
	__asm__("nop\n nop\n"); 
	do { 		// ASM: ldw X, #tick; lab$: decw X; tnzw X; jrne lab$
                __ticks--;//      2c;                 1c;     2c    ; 1/2c   
        } while ( __ticks );
	__asm__("nop\n");
#elif defined(__RCST7__)
/* RAISONANCE */
  #error ToDo for RAISONANCE
#elif defined(__ICCSTM8__)
/* IAR */
  #error ToDo for IAR
#else
 #error Unsupported Compiler!          /* Compiler defines not found */
#endif
}

Here are the important changes:

  1. @inline is a special indication needed in COSMIC C Compiler to generate inline functions.

  2. In MACRO T_COUNT the operator precedence was not correct. Hence exclusive brackets were added to make it clear. And a small change in the ordering was done. This was to make sure the results were in range.

  3. Change from F_CPU to FCLK for system clock. As that’s define chosen for System Clock fcpu in actual frequency value terms. E.g. FCLK=16000000 as part of the compiler pre-processor directives.

Here is a peek at the compiler settings:

Compiler Settings for the FCLK constants
Fig: Compiler Settings for the FCLK

Now lets look at the new generated asm code:

1
2
3
4
5
6
7
  83                     ; 45 	_asm("nop\n $N:\n decw X\n jrne $L\n nop\n ", __ticks);  
  86  0020 ae03e8        	ldw	x,#1000
  88  0023 9d            nop
  89  0024                L6:
  90  0024 5a             decw X
  91  0025 26fd           jrne L6
  92  0027 9d             nop

Initially we set the period of 1000 that’s what the ldw x,#1000 instruction shows.

Like we did earlier lets list out the instruction cycles:

  1. ldw x,#1000 = 4
  2. nop = 1
  3. decw X = 1
  4. jrne L6 = 1 normal / 2 in jump
  5. nop = 1

Hence Lcy = (1+2) * LoopCount - 1 = 3 * LoopCount - 1

And Tcy = Lcy + (4+1+1) = 6 + Lcy = 6 + (3 * (LoopCount - 1))

Lets then calculate the Actual Tcy by substituting the value.

Tcy = 6 + (3 * (1000 -1)) = 3003

At the Frequency of 16MHz = 16000000Hz we have a cycle time of

t = 1/f = 1/16MHz = 6.25e-8

Hence the Total duration should be

t(_delay_cycl{1000}) = Tcy * t = Tcy * 1/f = 3003 * 6.25e-8 = 1.876e-4 Seconds

Lets now look at what the scope Says (This time we are using Saleae Logic instead of Tek):

Plot of the New _delay_cycl function
Fig: Plot of the New _delay_cycl function

We Observe that:

  1. ON-Time = 1.908e-4 Seconds
  2. OFF-Time = 1.908e-4 Seconds
  3. Period = 3.816e-4 Seconds

Which is very Close to the Expected 1.876e-4 Seconds !

This is great achievement we are very close.

But in actual practice we need some way to specify time in Microseconds rather than absolute cycles.

This is where the T_COUNT macro comes in handy.

Stage 3 : Making the _delay_cycl function Useful #

Let’s review our setup this time in entirety:

  1. Quick look at the files we are Using:
    Files used in the Project
    Fig: Files used in the Project
  2. Now where to get the Drivers & Examples:
    STM8S Standard Peripheral Library STSW-STM8069 v2.3.0
    STM8S Code Examples STSW-STM8026 v1.02
  3. For correct files to use in the project look at the StdPeriph_Template\STVD\Cosmic directory.
    As you would need to get the correct stm8_interrupt_vector.c file for compilation to work.
    Other files like stm8s_conf.h, stm8s_it.h and stm8s_it.c are available at the root of the template directory.

Actual Main Code Listing:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76

/* Includes ------------------------------------------------------------------*/
#include "stm8s.h"

/* Private defines -----------------------------------------------------------*/
/* Private function prototypes -----------------------------------------------*/
/* Private functions ---------------------------------------------------------*/
#if defined(__CSMC__)
static @inline void _delay_cycl( unsigned short __ticks )
#else
static inline void _delay_cycl( unsigned short __ticks )
#endif
{
#if defined(__CSMC__)
/* COSMIC */  
	#define T_COUNT(x) (( x * (FCLK / 1000000UL) )-3)/3)
	// ldw X, __ticks ; insert automaticaly
	_asm("nop\n $N:\n decw X\n jrne $L\n nop\n ", __ticks);  
#elif defined(__SDCC)
  #define T_COUNT(x) (( x * (FCLK / 1000000UL) )-5)/5)
	__asm__("nop\n nop\n"); 
	do { 		// ASM: ldw X, #tick; lab$: decw X; tnzw X; jrne lab$
                __ticks--;//      2c;                 1c;     2c    ; 1/2c   
        } while ( __ticks );
	__asm__("nop\n");
#elif defined(__RCST7__)
/* RAISONANCE */
  #error ToDo for RAISONANCE
#elif defined(__ICCSTM8__)
/* IAR */
  #error ToDo for IAR
#else
 #error Unsupported Compiler!          /* Compiler defines not found */
#endif
}
/* Main function -------------------------------------------------------------*/
void main(void)
{
	/* Fmaster = 16MHz */
  CLK_HSIPrescalerConfig(CLK_PRESCALER_HSIDIV1);
	/* Disable all Peripheral Clocks */
	CLK->PCKENR1 = 0;
	CLK->PCKENR2 = 0;
	
	GPIO_Init(GPIOD, GPIO_PIN_2, GPIO_MODE_OUT_PP_LOW_FAST);
	
  /* Infinite loop */
  while (1)
  {
		GPIO_WriteReverse(GPIOD, GPIO_PIN_2);
		_delay_cycl(1000);
  }
  
}

#ifdef USE_FULL_ASSERT

/**
  * @brief  Reports the name of the source file and the source line number
  *   where the assert_param error has occurred.
  * @param file: pointer to the source file name
  * @param line: assert_param error line source number
  * @retval : None
  */
void assert_failed(u8* file, u32 line)
{ 
  /* User can add his own implementation to report the file name and line number,
     ex: printf("Wrong parameters value: file %s on line %d\r\n", file, line) */

  /* Infinite loop */
  while (1)
  {
  }
}
#endif

We would focus on this part alone:

1
2
3
4
5
6
7
...
  while (1)
  {
		GPIO_WriteReverse(GPIOD, GPIO_PIN_2);
		_delay_cycl(1000);
  }
...

As this is where we would introduce the change.

Lets alter it using the T_COUNT macro:

1
2
3
4
5
6
7
...
  while (1)
  {
		GPIO_WriteReverse(GPIOD, GPIO_PIN_2);
		_delay_cycl((unsigned short) (T_COUNT(100));
  }
...

This means we are going to wait for 100 Microseconds.

Surprise !

Any sane programmer would notice the problem here :

1
_delay_cycl((unsigned short) (T_COUNT(100));

Why that additional bracket in front of (T_COUNT ?

That’s a BUG in COSMIC compiler ! with respect to pre-processor.

So we need to modify the #define and the location where the macro is called.

Lets not worry about this too much since this would be buried in our API we create around the _delay_cycl function.

Lets look at our results:

Results of 100uS delay
Fig: Results of 100uS delay

That would be 102.2uS ON and OFF period very-close for practical use!

SUCCESS At Last !

This was the story of how I got to make the perfect STM8S cycle delay function. I am very thankful to Mr. Oleg Terentiev for publishing the library
https://github.com/Hoksmur/stm8_routines It was the source of this effort and path to solving the problem.

Folks if you have any insights please share. It took a while to complete this full story ;-).

*~~ Completed on 20th October 2018*