The order of operations matters. A lot!

martincho · Post by **martincho** » Wed Jun 15, 2022 5:40 am

Here's a tip.

Many decades ago I spent years developing software in a couple of languages that make you very aware of the order of operations. One was Forth and the other APL (yes, they couldn't be farther apart).

Anyhow, things are not very different with other languages. It's just that most people simply aren't aware of order of operations. Once you are, you look for opportunities to make performance and functional gains just by moving things around.

Try this:

Code: Select all

# The order of evaluation matters
import time

# Returns True after waiting 5 seconds
def slow_return_true():
    time.sleep(5)
    return True

def faster_return_true():
    time.sleep_ms(500)
    return True

# Returns False without delay
def fast_return_false():
    return False

# ------------------------------------------------------------
print("\nif slow_return_true() and faster_return_true():")

if slow_return_true() and faster_return_true():
    print("This will happen.")
print("done")

# ------------------------------------------------------------
print("\nif faster_return_true() and slow_return_true():")

if faster_return_true() and slow_return_true():
    print("This will happen.")
print("done")

# ------------------------------------------------------------
print("\nif slow_return_true() and fast_return_false():")

if slow_return_true() and fast_return_false():
    print("Not going to happen")
print("done")

# ------------------------------------------------------------
print("\nif fast_return_false() and slow_return_true():")

if fast_return_false() and slow_return_true():
    print("Not going to happen")
print("done")

Isn't that interesting?

How do you explain what happens with the second case vs. the last?

If you are looking for performance, you can't leave any stones unturned. Always measure, don't make assumptions.

stijn · Post by **stijn** » Wed Jun 15, 2022 6:22 am

Are you asking about short-circuiting boolean operators (https://docs.python.org/3/library/stdty ... and-or-not)? That's a potentially crucial difference with mereley changing order of code: it simply skips code.

Roberthh · Post by **Roberthh** » Wed Jun 15, 2022 9:51 am

How do you explain what happens with the second case vs. the last?

If in an and expression the one term is False, the remaining terms are not evaluated. Similar, if in an or expression one term is True, the remaining terms are skipped. Terms are evaluated from left to right. That is common to many programming languages and well documented.

martincho · Post by **martincho** » Wed Jun 15, 2022 2:01 pm

Roberthh wrote: ↑
Wed Jun 15, 2022 9:51 am

How do you explain what happens with the second case vs. the last?
If in an and expression the one term is False, the remaining terms are not evaluated. Similar, if in an or expression one term is True, the remaining terms are skipped. Terms are evaluated from left to right. That is common to many programming languages and well documented.

Yup. Exactly.

I just wanted to point this out for those who might not realize that you can actually affect code performance by how you order operations in certain constructs.

Another one that's interesting is checking for a specific value in a variable as opposed to zero. The zero check is faster because processors don't have to compare to anything, just load the value into a single register. So, a == 0 is faster than a == 23. The exception to that is if the "a" and "23" happen to already be loaded in registers, then it makes no difference.

martincho · Post by **martincho** » Wed Jun 15, 2022 2:06 pm

stijn wrote: ↑
Wed Jun 15, 2022 6:22 am
Are you asking about short-circuiting boolean operators (https://docs.python.org/3/library/stdty ... and-or-not)? That's a potentially crucial difference with mereley changing order of code: it simply skips code.

Sorry. Not asking. I know this. Since MicroPython is used by lots hobby programmers, I posted this as information for those who might not.

stijn · Post by **stijn** » Wed Jun 15, 2022 7:35 pm

Another one that's interesting is checking for a specific value in a variable as opposed to zero. The zero check is faster because processors don't have to compare to anything, just load the value into a single register. So, a == 0 is faster than a == 23. The exception to that is if the "a" and "23" happen to already be loaded in registers, then it makes no difference.

Would be interesting to figure out if this also makes difference in MicroPython

martincho · Post by **martincho** » Thu Jun 16, 2022 1:23 am

stijn wrote: ↑
Wed Jun 15, 2022 7:35 pm

Another one that's interesting is checking for a specific value in a variable as opposed to zero. The zero check is faster because processors don't have to compare to anything, just load the value into a single register. So, a == 0 is faster than a == 23. The exception to that is if the "a" and "23" happen to already be loaded in registers, then it makes no difference.
Would be interesting to figure out if this also makes difference in MicroPython

Yes, it does (also throwing in range to show that they are not always the fastest way to iterate):

Code: Select all

import time
import gc

import micropython

# @micropython.native
def counting_up():
    i = 0
    limit = 5000
    busy = False
    while i < limit:
        i += 1
        busy = True
    busy = False
    return [busy, i]


# @micropython.native
def counting_down():
    i = 5000
    busy = False
    while i:
        i -= 1
        busy = True
    busy = False
    return [busy, i]


# @micropython.native
def using_range_from_zero():
    i = 0
    busy = False
    for n in range(5000):
        i += 1
        busy = True
    busy = False
    return [busy, i]


# @micropython.native
def using_range_from_value():
    i = 5000
    busy = False
    for n in range(4999, -1, -1):
        i -= 1
        busy = True
    busy = False
    return [busy, i]


@micropython.native
def count_direction_test():
    ITERATIONS = 100

    print(f"\nTotal run time for {ITERATIONS} iterations:")

    output = 0
    gc.collect()
    start = time.ticks_ms()
    for test in range(ITERATIONS):
        output = counting_up()
    end = time.ticks_ms()
    print(f"counting_up:           {time.ticks_diff(end, start):10.1f} ms for {ITERATIONS} iterations, output:{output}")

    output = 0
    gc.collect()
    start = time.ticks_ms()
    for test in range(ITERATIONS):
        output = counting_down()
    end = time.ticks_ms()
    print(f"counting_down:         {time.ticks_diff(end, start):10.1f} ms for {ITERATIONS} iterations, output:{output}")

    output = 0
    gc.collect()
    start = time.ticks_ms()
    for test in range(ITERATIONS):
        output = using_range_from_zero()
    end = time.ticks_ms()
    print(f"using_range_from_zero: {time.ticks_diff(end, start):10.1f} ms for {ITERATIONS} iterations, output:{output}")

    output = 0
    gc.collect()
    start = time.ticks_ms()
    for test in range(ITERATIONS):
        output = using_range_from_value()
    end = time.ticks_ms()
    print(f"using_range_from_value:{time.ticks_diff(end, start):10.1f} ms for {ITERATIONS} iterations, output:{output}")


count_direction_test()

Results:

Code: Select all

Total run time for 100 iterations:
counting_up:               2156.0 ms for 100 iterations, output:[False, 5000]
counting_down:             1615.0 ms for 100 iterations, output:[False, 0]
using_range_from_zero:     2916.0 ms for 100 iterations, output:[False, 5000]
using_range_from_value:    2819.0 ms for 100 iterations, output:[False, 0]

That's a 25% improvement just for counting down to zero.

If I uncomment "@micropython.native" on these test functions it gets even better:

Code: Select all

Total run time for 100 iterations:
counting_up:                794.0 ms for 100 iterations, output:[False, 5000]
counting_down:              518.0 ms for 100 iterations, output:[False, 0]
using_range_from_zero:     1206.0 ms for 100 iterations, output:[False, 5000]
using_range_from_value:    1213.0 ms for 100 iterations, output:[False, 0]

That's a 35% reduction in run time, again, for counting backwards.

If, instead, I replace that with "@micropyton.viper" (only for the functions, not the tester):

Code: Select all

Total run time for 100 iterations:
counting_up:                 53.0 ms for 100 iterations, output:[False, 5000]
counting_down:               37.0 ms for 100 iterations, output:[False, 0]
using_range_from_zero:      125.0 ms for 100 iterations, output:[False, 5000]
using_range_from_value:     133.0 ms for 100 iterations, output:[False, 0]

On top of the 10x order gains in execution time, the difference between counting up and down is still just over a 30% gain.

It stands to reason. As I said, processors can test for zero in a register just by loading the register. Comparing to a number requires loading two registers. If you are comparing to a value that is greater than 8 bits, it take even longer as the value has to be built-up from multiple operations.

At the end of the day everything ends up in machine code, the rules don't change.

Now, having said that, there are compilers that are very smart and look for such opportunities to optimize. I am speaking of such things as C/C++/Objective-C, etc. compilers. I don't know to what extent full Python is able to do any such optimization.

On Python 3.9 the same basic code yields a 14% improvement for down-counting. Range is worse when counting down, yet about the same time as the count-up case. This is likely a case of using a much more sophisticated processor where instruction pre-fetching and other hardware techniques can be at play.

scruss · Post by **scruss** » Thu Jun 16, 2022 2:12 am

martincho wrote: ↑
Thu Jun 16, 2022 1:23 am
... (also throwing in range to show that they are not always the fastest way to iterate)

But range is the correct "pythonic" way of doing it, and if it's fast enough, it's the one to use. Clarity and portability above all.

martincho · Post by **martincho** » Thu Jun 16, 2022 3:56 am

scruss wrote: ↑
Thu Jun 16, 2022 2:12 am

martincho wrote: ↑
Thu Jun 16, 2022 1:23 am
... (also throwing in range to show that they are not always the fastest way to iterate)
But range is the correct "pythonic" way of doing it, and if it's fast enough, it's the one to use. Clarity and portability above all.

Well, sorry, I don't subscribe to that idea at all. Good, well documented code is clear and portable. I have code I wrote twenty years ago in LISP that was super-simple to port over to C++ because I spent as much time documenting it as I did coding.

This is not unlike such things as unrolling loops (in any language). Sometimes you have to. There's nothing wrong with that. I had a case where unrolling a loop (in assembler) provided a 10 microsecond gain in execution time. And, for that application, that was significant, it represented a 33% reduction.

There's another argument to be had about this which very few people even realize exists. It starts with a question:

Why is it that, to use it as an example, range() takes longer?

It's simple: It has to execute more operations. It has to move more bits around in the processor and memory Many more. In my simple example, in rough terms, counting down was half the execution time of range(). With optimization available in MicroPython, counting down was 1/3 the execution time. In other words, the Pythonic way took two to three times longer to execute.

Frankly, I think that's almost irresponsible, and I mean this at a global scale.

I am going to get philosophical here for a moment.

These issues of purity taken to an almost cultist extreme (not accusing you of that) have a cost in energy consumption. And, at a global scale, it is nothing less than irresponsible to take such paths for the sake of some idea of purity.

You don't have to believe what I say. There's plenty of data out there. Python requires, on average, 75 times more energy to do the same job as does C. Here's the data from tests on nearly 30 languages:

https://thenewstack.io/which-programmin ... ectricity/

In CMOS circuitry energy is consumed any time a state change occurs. You want to minimize state changes (going from a 0 to a 1 or the other way around). This, generalizing, means executing the least possible number of operations to get the job done. I say "generalizing" because it isn't always quite that simple. You have to look at a lot more than that. Yet, as a rough order of magnitude metric, less operations means less energy.

Few people realize that, for example, processing a data stream with less bit changes consumes less energy than one with many more bit changes for the same number of bytes. I learned this the hard way (burned my finger) when doing real-time image processing work on FPGA's. The thing ran super cool when feeding it simple test signals where most pixels rarely changed. Then, one day, during testing, I fed it video captures from a helicopter in flight. I smelled something, so I touched the FPGA. After getting burned I ultimately figured out why that was happening. Nearly every pixel was changing all the time and more bit transitions meant more energy and more heat. Wicked stuff.

So, yes, I prefer to go faster with more efficient code when possible, both to meet timing requirements and save energy. Shipping millions of devices that consume 10, 20 or 70 times more power than they should has consequences we might not realize.

OK, no more philosophy. I promise.

stijn · Post by **stijn** » Thu Jun 16, 2022 10:01 am

That's a 25% improvement just for counting down to zero.

Ok, but the counting down compares using equality, the other one using less than. That's not the same as comparing wheher a == 0 is faster than a == 23 which is what we were discussing originally. I don't have time to write a complete portable script like yours, sorry, but for fun this compares 4 cases:

Code: Select all

n = 1000
inner = 10000

@timer.Profile(logEvery=n, name='eqz')
def eqz():
  for i in range(inner):
    x = i == 0

@timer.Profile(logEvery=n, name='eqnz')
def eqnz():
  for i in range(inner):
    x = i == 23

@timer.Profile(logEvery=n, name='ltz')
def ltz():
  for i in range(inner):
    x = i < 0

@timer.Profile(logEvery=n, name='ltnz')
def ltnz():
  for i in range(inner):
    x = i < 23

Idea is to test bare oprations and not much more, functions calls wre orderd randomly in the outer loop to erradicate warmup and whatnot, calling each 1000 times. Tested on a PC, results aren't quite conclusive, taking a run which has the least outliers:

Code: Select all

ltz n=1000 min=0.379 max=1.368 avg=0.446
ltnz n=1000 min=0.389 max=1.162 avg=0.446
eqz n=1000 min=0.388 max=1.296 avg=0.459
eqnz n=1000 min=0.391 max=1.218 avg=0.459

OK, no more philosophy. I promise.

I don't mind

And you're not wrong, especially not in an example like yours where you have millions devices, but I find it perfectly understandable that if one does not have one million devices, and might even have a one-shot script, one chooses range() because it simple, clear, just works, known by everybody, requires no tests, and so on.

MicroPython Forum (Archive)

The order of operations matters. A lot!

The order of operations matters. A lot!

Re: The order of operations matters. A lot!

Re: The order of operations matters. A lot!

Re: The order of operations matters. A lot!

Re: The order of operations matters. A lot!

Re: The order of operations matters. A lot!

Re: The order of operations matters. A lot!

Re: The order of operations matters. A lot!

Re: The order of operations matters. A lot!

Re: The order of operations matters. A lot!