Zu dem Thema Takt (clock cycle) liest man im Datenblatt des ATmega8(L):

The interrupt execution response for all the enabled AVR interrupts is four clock cycles minimum. After four clock cycles, the Program Vector address for the actual interrupt handling routine is executed. During this 4-clock cycle period, the Program Counter is pushed onto the Stack. The Vector is normally a jump to the interrupt routine, and this jump takes three clock cycles. If an interrupt occurs during execution of a multi-cycle instruction, this instruction is completed before the interrupt is served. If an interrupt occurs when the MCU is in sleep mode, the interrupt execution response time is increased by four clock cycles. This increase comes in addition to the start-up time from the selected sleep mode. A return from an interrupt handling routine takes four clock cycles. During these four clock cycles, the Program Counter (2 bytes) is popped back from the Stack, the Stack Pointer is incremented by 2, and the I-bit in SREG is set.
Das sollte keine Verzögerung einbauen, aber sicher bin ich mir nicht völlig.

The counting direction is always up (incrementing), and no counter clear is performed. The counter simply overruns when it passes its maximum 8-bit value (MAX = 0xFF) and then restarts from the bottom (0x00). In normal operation the Timer/Counter Overflow Flag (TOV0) will be set in the same timer clock cycle as the TCNT0 becomes zero. The TOV0 Flag in this case behaves like a ninth bit, except that it is only set, not cleared. However, combined with the timer overflow interrupt that automatically clears the TOV0 Flag, the timer resolution can be increased by software. A new counter value can be written anytime.
Hier ist alles klar. Ich grüble, ob nicht exakt bei 0 die 37 addiert wird, sondern erst einige Takte später, da der Prozessor nach dem Overflow in der Routine noch mit einigem anderen beschäftigt ist, siehe:
Code:
SIGNAL (SIG_OVERFLOW2)
{
  a0:	1f 92       	push	r1
  a2:	0f 92       	push	r0
  a4:	0f b6       	in	r0, 0x3f	; 63
  a6:	0f 92       	push	r0
  a8:	11 24       	eor	r1, r1
  aa:	8f 93       	push	r24
  ac:	9f 93       	push	r25
  ae:	af 93       	push	r26
  b0:	bf 93       	push	r27
  TCNT2 += 0x25;
  b2:	84 b5       	in	r24, 0x24	; 36
  b4:	8b 5d       	subi	r24, 0xDB	; 219
  b6:	84 bd       	out	0x24, r24	; 36
  count36kHz ++;
  b8:	80 91 71 00 	lds	r24, 0x0071
  bc:	8f 5f       	subi	r24, 0xFF	; 255
  be:	80 93 71 00 	sts	0x0071, r24
  if (!count36kHz)
  c2:	80 91 71 00 	lds	r24, 0x0071
  c6:	88 23       	and	r24, r24
  c8:	99 f4       	brne	.+38     	; 0xf0 <__vector_4+0x50>
Die Zeile mit dem += wird im Assembler so aufgelößt:
in r24, 0x24 ; 36
lds r25, 0x0061
add r24, r25
out 0x24, r24 ; 36
Warum ich im Excel-Blatt aber nur 3 Takte 'dazumogel' kann ich nicht mehr sagen.
Zur Anweisung +=

In der temporären Datei xxx.lss findet sich bei mir:

TCNT2 += 0x25;
b2: 84 b5 in r24, 0x24 ; 36
b4: 8b 5d subi r24, 0xDB ; 219
b6: 84 bd out 0x24, r24 ; 36

in (In Port) 1 clock
subi (Subtract Constant from Register) 1 clock
out (Out Port) 1 clock

... macht genau 3 Takte. Perfekt!