I did some rough measurements using my picoscope on EMW3165 (ie. STM32F411CE) module to find out how deep sleeping during tickless idle affects system scheduling latency. I connected logic analyzer to four GPIO signals to get timings for SysTick, RTC wakeup, HSE oscillator startup time and application task activity. GPIO pins are driven by a simple test application on STM32, which initializes a pico]OS timer to fire each 150 ms. To find out differences between normal and tickless mode it also toggles power saving mode each pass.
First timings for case where systick is always active and CPU does not sleep:
SysTick interrupt handler seems to execute under 5 microseconds and PendSV handler that does the context switch in this case takes 3.5 microseconds. So, after interrupt is raised it seems to take a little more than 8 microseconds to application task serving it to be activated.
Next the case with tickless mode. More latency is to be expected because HSE oscillator is stopped and must be restarted before application continues.
In the worst case latency is a lot bigger. It seems to take 80 µs for CPU to start up from sleep and 720 µs for HSE oscillator to restart. Context switch to application task takes 6.6 µs, which is a little bit longer than in previous case (this is because context switch is delayed after HSE startup, needing an extra SVCall interrupt to handle it). Summing all these up it seems that the worst case latency is about 807 µs, which is 100 times more than in previous case.
The extra scheduling latency doesn’t hurt timer operations much, because pico]OS uses “safety margin” for tickless sleep – it always wakes up a few ticks too early, giving HSE oscillator time to startup correctly before it is needed. But extra latency is present on other interrupts, because there is no way to predict them.
If application cannot tolerate bigger latency, it is advisable either to leave tickless mode disabled or use posPowerDisableSleep in critical places to enforce smaller latency.