Training & Learning – Blog 04 – Reward & Punishment |

Training & Learning – Part 04

Training & Lernen – Teil 04

Reward & Punishment

In the last Blog we started talking about the ABC of Learning and explained its “A” – the “Antecedent”.

We found out that the horse’s voluntary cooperation can only be achieved, if the horse experiences the situation as enjoyable and “fitness-improving” (anything that makes his survival more likely, such as food, rest, feeling of safety etc).

We also heard that the choice of the “C” (the consequence of the horse’s chosen behaviour as result of a stimulus given by the trainer) will be the sticking point of the effectiveness of your training.

So:

“A” – the trainer has created a whole store of incentives and tries to elicit a certain desirable behaviour from her horse.

“B” – the horse reacts out of his own free will. He offers an action – perhaps the correct one, but perhaps not (he might be trying by trial and error to guess her intentions) – or perhaps nothing at all!

“C” – now it will be the C, the consequence of his behaviour, which will help him (or not!) to find the correct answer.

If he found the right answer, he got rewarded. In the old days, if he guessed wrong, the trainer shouted NO! or even punished him. Not with us!

In order to understand how “Shaping” happens, we have to briefly define here the concepts of reward and punishment.

While “shaping” an animal’s behaviour, trainers talk about “reinforcement” of observable behaviour. In the Blog series “Your horse’s IQ” part 06, we already talked about the notion of positive and negative reinforcement – so I don’t repeat it here. But Eva Wiemers, in her new book “Wer lernt mit uns?” puts a new twist on this!

E.W. gives very specific advice for practical teaching situations, where the ordinary definitions leave us stranded. We dive deep – because lots of teaching and learning connections happen unconsciously – and re-emerge with a bucket full of most helpful tools to carry on!

Reward:

is an everyday word, which only signifies that the giver assumes the receiver will like it. However this assumption rests on his own evaluation of the quality of the reward and is independent of the value code of the receiver.

E.W. always gives wonderful examples: what if “you reward someone with candy, who does not like sweets? Or who happens to have an open cavity in his tooth? … Or if the natives in the jungle offer you a bowl of living maggots as thanks for your help, which you are ‘allowed’ to eat as a reward?”

It is also important to notice that a typical “reward” is often only handed out once the behaviour is long finished – and we already know that a horse is then not able to make a mental connection between the action and its reward.

Punishment:

the person who dishes it out assumes that it will be disagreeable to the receiver – regardless of whether he takes something away (of which he assumes that it has value for the punished) or whether he subjects him to mistreatment. “But what if the punished subject is a masochist? Or the traffic offender, who receives a ticket, a millionaire?”

The connection can also get lost as far as content is concerned: the parents take the baby’s doll away, because it did not obey the babysitter hours before… a human capable of logic thought may still realize the connection – but this is totally impossible for an animal.

Lo and behold:

we realize that with reward and punishment the focus is really the person handing it out, who mainly expresses his own mood by it!
He rewards: he is happy or satisfied.
He punishes: he is dissatisfied, angry, frustrated or worse…

E.W.: “Instead of attributing rewarding or punishing characteristics to certain things, scientists turn their attention to the observation, how the recipient experiences them.” Reward as well as punishment only take effect, when it causes those feelings in the receiver, which the sender intended.

Here translation of Eva’s book becomes difficult, as she starts inventing her own words, and to get the point across, I must do likewise. We create new tools – “indulgerlings and annoyerlings”

To work out the difference between every-day reward and punishment and our “indulgerlings and annoyerlings” we have to transform them into effective tools that do more than just express satisfaction or dissatisfaction – but rather which will change the horse’s behaviour in the desired way!

We find new tools:

A) useful “Indulgerlings” (rewards)

are comforts, which improve the atmosphere between the giver and the recipient and which can create a momentary favourable attitude by the receiver toward the sender.

B) useful “annoyerlings” (punishment)

are disagreeable things, which irritate, can spoil the relationship and frustrate, disappoint, even frighten or enrage the recipient.

The horse must be able to understand exactly WHAT he received the above FOR !

We already know that the horse can only form this association, when the action and consequence happen simultaneously.

For rewarding an action, usually the logical sequence and correct timing are enough for the horse to make the connection.

For punishment this is much more complex: The trainer has the choice to punish re-actively or proactively in a given situation.

Here I really regret not to be able to simply quote E.W.’s book – her examples are so excellent! So summarized very briefly…

1) reactive punishment comes after the “crime”:

He bites you – you hit him. The horse does understand why he got slapped – BUT: this does not mean that „in the future he will act in the way desired by you: as the punishment happened after the deed …, the yielding with his head did not bring him any benefit. Only a benefit however (fitness improvement!) would teach him, what is expected from him“.

a) Physical punishment:

Causing pain is inexcusable. Physical punishment can never be applied where voluntary cooperation is the goal, it simply is no teaching tool :
it can lead to dangerous reactions of the horse
it has no influence on the behaviour chosen by the horse as reaction (which could also be undesirable)
it spoils the relationship and motivation

b) Removal of something as punishment: (for example taking away treats as reaction to his demanding pawing when he sees your carrot bag)

he might stop the behaviour when you remove yourself WITH the bag – but that does not lead to good training!
the removal itself does not suggest an alternative better behaviour to him.

c) withholding s.th. as punishment:

the horse fails to guess the correct behaviour, and doesn’t get his treat – his reaction is disappointment (because he did try) – but withholding the treat does not indicate to him what he should have done! It does not teach… just damages his motivation.

2) Pro-active punishment:

Here the trainer also takes away or withholds s.th. which he assumes the trainee would like – but this is done to early and so rapidly that the horse has no time to bring his undesired deed to an end! (You retract your hand with the treat as soon as he flattens his ears – before you get bitten). The horse learns that his threatening gesture was a mistake – but he still does not know how to easily get to a success!

The logical answer become clear quickly:

A punishment only says “NOT this!”, but we want to also say “Rather THAT!”
We have to give the horse a chance and show him the way to EARN his reward!

So after he threatened to bite you (and you very quickly retracted your carrot) you offer it to him AGAIN (as soon as the ears are forward again). This re-offering IS the reward for stopping the unwanted threatening attitude – but how to make sure he won’t try to bite again in his eager greed?

Here the devil lies in the detail and in the next Blog we will summarize how Eva Wiemers teaches us to “reward like a professional“!

Read On !!!

Belohnen & Strafen

Im letzten Blog haben wir angefangen über das ABC des zu sprechen, und erklärten das „A“ – die Ausgangslage.

Wir erfuhren, daß die freiwillige Mitarbeit unseres Pferdes nur dadurch erreicht werden kann, daß es die Situation als erfreulich und „fitnessverbessernd“ erlebt (alles, das sein Überleben wahrscheinlicher macht, wie Futter, Ruhe, Sicherheitsgefühl usw.)

Wir erfuhren auch, daß die Wahl des „C“ (die Konsequenz der vom Pferd gewählten Antwort auf einen Stimulus des Trainers) der Knackpunkt der Effektivität Ihres Training sein wird.

Also:

„A“ – der Trainer hat einen ganzen Anreizfundus geschaffen und versucht bei ihrem Pferd ein erwünschtes Verhalten hervorzurufen.

„B“ – das Pferd reagiert aus freiem Willen. Es bietet eine Aktion an – vielleicht die korrekte, aber vielleicht auch nicht (falls es mit Probiermethode versucht, Ihre Absicht zu erraten) – vielleicht aber tut es auch gar nichts!

„C“ – nun ist es das C, die Konsequenz seines Verhaltens, das ihm helfen wird (oder auch nicht!) die korrekte Antwort zu finden.

Wenn es richtig geraten hat, wurde es belohnt. Und früher, wenn es falsch riet, brüllte der Trainer NEIN! oder bestrafte es sogar. Nicht hier!

Um zu verstehen, wie „Formen“ vor sich geht, müssen wir die Begriffe Belohnung und Strafe definieren.

Beim „Formen“ eines tierischen Verhaltens sprechen Trainer von „Bestärkung“ eines beobachtbaren Verhaltens. In der Blog Serie „Der IQ Ihres Pferdes“ im Teil 06 haben wir schon von positiver und negativer Bestärkung gesprochen, so daß ich es hier nicht wiederhole. Aber in Eva Wiemers neuem Buch „Wer lernt mit uns?“ gehen wir noch weiter!

E.W. gibt uns ganz spezifischen Rat für praktische Lehrsituationen, wo die alltäglichen Definitionen uns gestrandet lassen. Wir tauchen tief – da viele der Lehr- und Lernzusammenhänge unbewußt bleiben – und kommen mit einem Eimer voller hilfreicher Werkzeuge wieder hoch!

Belohnung:

ist ein Alltagswort, welches nur bedeutet, daß der Geber annimmt, daß der Empfänger es mögen wird – diese Annahme beruht aber auf seiner eigenen Wertung der Qualität der Belohnung und ist unabhängig vom Wertekanon des Empfängers.

E.W. gibt wie immer wundervolle Beispiele, wie z.B.: Was aber ist, wenn „… der mit der Praline Gelobte keine Süssigkeiten mag? Oder gerade ein offenes Loch im Zahn hat? … Oder wenn Ihnen die Eingeborenen für Ihren Einsatz im Dschungel einen Napf voller lebendiger Maden anbieten, die Sie ‚zur Belohnung‘ essen dürfen?“

Ebenfalls wichtig ist, daß eine typische „Belohnung“ oft erst verabreicht wird, wenn das Verhalten längst abgeschlossen ist – und wir wissen ja schon, daß ein Pferd dann keine mentale Verbindung zwischen Verhalten und Belohnung mehr herstellen kann.

Strafe:

hier nimmt der Austeiler an, daß es dem Empfänger unangenehm sein wird – gleich, ob er ihm etwas wegnimmt (von dem er glaubt es sei dem Bestraften wertvoll) oder ihm etwas Unangenehmes antut (Ohrfeige, Schlag). „Was ist aber zum Beispiel, wenn der mit der Peitsche Gestrafte ein Masochist ist? Oder der Verkehrssünder, der einen Strafzettel bekommt, ein Millionär?“

Auch inhaltlich kann die Zusammengehörigkeit verloren gehen: die Eltern nehmen dem Kleinkind die Puppe weg, weil es Stunden vorher dem Babysitter nicht gefolgt hat…Ein schon logisch denkender Mensch kann den Zusammenhang begreifen – aber einem Tier ist das gänzlich unmöglich.

Siehe da:

wir erkennen, daß bei Belohnung und Strafe eigentlich der Austeiler im Fokus steht, der damit hauptsächlich seine eigene Stimmung ausdrückt!
Er belohnt: er ist glücklich und zufrieden.
Er bestraft: er ist unzufrieden, wütend, frustriert oder Schlimmeres….

E.W.: „Anstatt den Dingen an sich belohnende oder strafende Eigenschaften zuzuschreiben, richten Wissenschaftler ihr Augenmerk darauf, wie der Empfänger sie empfindet.“ Belohnung und Strafe gleichermaßen wirken erst dann, wenn es diejenigen Gefühle im Empfänger hervorruft, die der Sender beabsichtigte.

Hier erfindet E.W. ihre eigenen Vokabeln! Wir schaffen neue Werkzeuge – Verwöhnerli und Vergrätzlerli:

Um einen Unterschied zwischen normaler Belohnung/Bestrafung und diesen Verwöhnerli und Vergrätzlerli zu erkennen, müssen wir sie umwandeln in effektive Werkzeuge, die mehr erreichen als nur einen Ausdruck der Zufriedenheit oder Unzufriedenheit – sie müssen das Verhalten des Pferdes wirklich in der erwünschten Weise verwandeln!

Wir finden neue Werkzeuge:

A) nützliche „Verwöhnerli“

„…sind Annehmlichkeiten, die die Atmosphäre zwischen Geber und Empfänger verbessern und im Empfänger momentan eine wohlwollende Einstellung zum Geber erzeugen können.“

B) nützliche „Vergrätzerli“:

sind Unannehmlichkeiten, die verärgern, die die Beziehung verderben und frustrieren können, die enttäuschen, sogar Angst machen oder Wut im Empfänger hervorrufen können.

Das Pferd muß genau verstehen können, WOFÜR es etwas erhalten hat!
Wir wissen ja schon, daß es so eine Assoziation nur herstellen kann, wenn die Tat und die Konsequenz fast zeitgleich auftreten.

Für eine Belohnung ist dies oft schon genug, wenn die logische Reihenfolge und das Timing stimmen.

Für Bestrafung ist es komplexer:
Der Trainer hat die Wahl, in einer Situation reaktiv oder proaktiv zu strafen.

Nun tut es mir gradezu weh, das Buch von E.W. nicht einfach zitieren zu können – aber das führt zu weit. Bitte lesen Sie es! Die Beispiele sind so toll! Hier nur ganz kurz…

1) reaktive Strafe kommt nach der „Untat“:

Sie werden gebissen und hauen drauf . Das Pferd kapiert durchaus, warum der Hieb kam – ABER: „In Zukunft handelt das Pferd aber nicht zwangsläufig im von Ihnen beabsichtigten Sinne: da die Strafe nach vollendeter Tat erfolgte …, brachte ihm das Ausweichen mit dem Kopf keinen Nutzen. Nur ein Nutzen (Fitnessverbesserung!) würde ihm aber zeigen, was von ihm erwartet wird.“

a) körperliches Strafen:

Schmerzen verursachen ist unentschuldbar. Körperliches Strafen kann nie benutzt werden, wo freiwillige Mitarbeit angestrebt wird, es ist einfach kein Lehrmittel:

es kann zu gefährlichen Reaktionen des Pferdes führen
es hat keinen Einfluß auf das Verhalten, welches das Pferd stattdessen wählt (das könnte ja auch unerwünscht sein)
es verdirbt die Beziehung und Motivation

b) Entzug von Etwas als Strafe: (z. B. Möhren wegnehmen als Reaktion auf sein verlangendes Scharren, wenn es die Tüte sieht)

es mag sein Verhalten ändern, wenn Sie sich MIT den Möhren zurückziehen – das aber führt nicht zu gutem Training!
der Entzug an sich vermittelt dem Pferd keine bessere Idee für ein alternatives Verhalten.

c) strafendes Vorenthalten:

wenn das Pferd nicht die richtige Lösung erraten hat und also keine Belohnung bekommt, ist seine Reaktion Enttäuschung (den es hat sich ja bemüht!) – aber das Vorenthalten der Möhre sagt dem Pferd nicht, was es eigentlich hätte tun sollen! Es lehrt also nicht… verdirbt nur die Motivation.

2) Proaktives Strafen:

Hier entzieht oder vorenthält der Trainer etwas von dem er annimmt, daß der Schüler es gerne hätte – das aber passiert so früh und so schnell, daß das Pferd sein unerwünschtes Verhalten nicht zu Ende bringen kann. (Sie ziehen Ihre Hand zurück sobald er die Ohren anlegt – bevor Sie gebissen werden). Das Pferd lernt, daß seine Drohgeste ein Fehler war – aber es weiß immer noch nicht, wie es stattdessen leicht zum Erfolg gelangen kann!

Die logische Folge wird schnell klar:

Eine Strafe sagt immer nur „NICHT das!“, aber wir müssen ebenfalls sagen „sondern DAS!“
Wir müssen dem Pferd eine Chance geben und ihm den Weg aufzeigen, wie es sich seine Belohnung VERDIENEN kann!

Nachdem es also die Ohren anlegt (und Sie schnell die Möhrenhand entziehen), bieten Sie die Möhre (nachdem die Ohren wieder vor sind) NOCHmals an. Dieses abermalige Anbieten IST die Belohnung für das Abstoppen der unerwünschten Drohgebärde – aber wie ihm nun vermitteln, daß es in seiner eifrigen Gier nicht wieder zubeißt

Hier liegt der Teufel im Detail und im nächsten Blog fassen wir zusammen, wie Eva Wiemers uns beibringt zu „Belohnen wie ein Profi“.

Training & Learning – Blog 04 – Reward & Punishment

Training & Learning – Part 04

Training & Lernen – Teil 04

Reward & Punishment

So:

Reward:

Punishment:

Lo and behold:

We find new tools:

2) Pro-active punishment:

Read On !!!

Belohnen & Strafen

Also:

Belohnung:

Strafe:

Siehe da:

Wir finden neue Werkzeuge:

2) Proaktives Strafen:

Lesen Sie weiter !!

Subscribe to Blog via Email

Training & Learning – Blog 04 – Reward & Punishment

Training & Learning – Part 04

Training & Lernen – Teil 04

Reward & Punishment

So:

Reward:

Punishment:

Lo and behold:

We find new tools:

2) Pro-active punishment:

Read On !!!

Belohnen & Strafen

Also:

Belohnung:

Strafe:

Siehe da:

Wir finden neue Werkzeuge:

2) Proaktives Strafen:

Lesen Sie weiter !!

Share this:

Subscribe to Blog via Email