Table of Contents
Operant conditioning, also known as instrumental conditioning, is a learning process in which behavior is modified using rewards or punishments. By repeatedly pairing the desired behavior with a consequence, an association is formed to create new learning.
E.g. a dog trainer gives his dog a treat every time the dog raises its left paw. The dog learns that raising its left paw can earn him food reward. It will raise his paw again and again for more treats.
Classical Conditioning as the Foundation of Behaviorism
We can trace back the origin of operant conditioning to its predecessor, classical conditioning.
Classical conditioning, also known as Pavlovian conditioning, also involves learning a new behavior through the process of association.2
Russian physiologist Ivan Pavlov first experimented with classical conditioning in the late 1800s. He noticed that his dogs salivated whenever he entered the room to feed them.
In his experiments, Pavlov rang a bell every time he fed his dogs. Over time, the dogs became conditioned to salivate when they heard the sound of a bell, even when food wasn’t present.
Food, which was able to trigger salivation naturally, is the unconditioned stimulus. The bell’s sound, which began to trigger salivation after being paired with food, was the conditioned stimulus.
When the unconditioned stimulus (food) and the conditioned stimulus (sound) became associated, the conditioned stimulus could trigger the same response. This newly learned response became a conditioned response. This is a form of learning by association.
Pavlovian conditioning became the foundation of Behaviorism, a leading field within the study of psychology at the time. Behaviorists believe that behavior is a response to external stimuli, and humans only learn by association, not by thoughts, feelings, or inner mental events.
Law of Effect & Operant Conditioning
Later, psychologist Edward Thorndike came up with the concept of instrumental conditioning when he observed the impact of reinforcement in puzzle box experiments with cats trying to escape. He called this process “trial-and-error” learning.
Thorndike proposed the Law of Effect3, which stated that if in the presence of a stimulus, a response was followed by a satisfying event (reinforcer), the bond between stimulus and response was strengthened. Conversely, if a response-stimulus event was followed by an unsatisfying event (punisher), the bond was weakened.
In the early 1900s, behavioral psychologist B.F. Skinner, also known as the father of operant conditioning, built on the concepts of reinforcer and punisher to create the theory of operant conditioning.
Skinner believed that Pavlovian conditioning was far too simple to explain complex human behavior thoroughly. He believed the best way to understand operant behavior was to observe its causes and consequences1.
In Skinner’s operant conditioning paradigm, observable behavior can be manipulated when it is followed by reinforcement or punishment.
Unlike classical conditioning, which involves unconscious reflexive behavior, operant behaviors are behaviors under conscious control. Applying reinforcement and punishment creates a deliberate and conscious learning process.
To study operant conditioning, BF Skinner made a chamber, called the Skinner Box, and put a small animal inside. In the experiments, each time the animal pressed a lever or a bar, it received food or water as reinforcement4.
Reinforcement increases target behavior, while punishment decreases it.
Through his experiments, Skinner distinguished two types of consequences that could affect new learning: reinforcement vs punishment.
There are two types of reinforcement – positive reinforcement and negative reinforcement.
In psychology, positive refers to adding a stimulus and negative removing one.
Positive reinforcement adds a rewarding consequence as a positive reinforcer to behavior, therefore strengthening or increasing the likelihood that the desired behavior will appear again.
Negative reinforcement removes an unpleasant stimulus to increase the desired behavior in the future.
Punishment is the opposite of reinforcement. It aims to reduce bad behavior.
Like reinforcement, punishment also comes in two forms: positive punishment and negative punishment.
Positive punishment adds an unpleasant stimulus to weaken or eliminate a behavior. Positive punishment is usually what we refer to as “punishment” in our everyday lives.
Negative punishment removes a pleasant stimulus to stop undesired behavior.
24 Examples of operant conditioning
Here are the different types of operant conditioning examples.
Example of positive reinforcement
- A parent gives their child an extra allowance (reinforcer) for doing the dishes (desired behavior.)
- A manager offers bonuses (reinforcer) to their workers for finishing the project on time (desired behavior.)
- A teacher gives students gold stars (reinforcer) for raising their hands before they speak (good behavior.)
- You receive applause from the audience (reinforcer) after playing the piano (wanted behavior) in a recital.
- Young children pat a dog on the head (reinforcer) when it sits quietly in front of them (desirable behavior.)
- Gamblers win monetary reward (reinforcer) for playing at the slot machines continuously (encouraged behavior.)
Example of negative reinforcement
- A child doesn’t have to clean the table (unpleasant event) after the meal if they eat their vegetable (desired behavior.)
- Taking out the garbage (desired behavior) removes rotten smell (unpleasant stimulus) in the kitchen.
- Brushing the teeth (desired behavior) prevents tooth decay (unpleasant event.)
- Workers won’t get yelled at (unpleasant stimulus) when they arrive at work on time (wanted behavior.)
- A teenager cleans up his room (desirable behavior) so that his phone won’t be taken away (unpleasant event.)
- Putting away toys neatly (wanted behavior) and the parent won’t throw them away (unpleasant event.)
Example of positive punishment
- A parent assigns the child extra chores (unpleasant consequence) for playing too much video games (bad behavior.)
- Teacher gives a student extra homework (aversive stimulus) for making noise in class (undesired behavior.)
- Parents spank children (unpleasant stimulus) for skipping classes (unwanted behavior.)
- A child is scolded (unpleasant event) for ignoring homework (undesirable behavior.)
- A parent gives a child a time-out (unpleasant consequence) for throwing tantrums (unwanted behavior.)
- The police gives a driver a ticket (unpleasant stimulus) for speeding (unwanted behavior.)
Example of negative punishment
- A parent takes away their child’s phone (pleasant stimulus) for watching too much videos (bad behavior.)
- The police revoke the driver’s license (pleasant stimulus) for reckless driving (unwanted behavior.)
- Students lose recess time (pleasant stimulus) for making too much noise (undesired behavior.)
- A teenager cannot go to the mall (pleasant stimulus) for missing curfew (bad behavior.)
- A boy loses his tablet time (pleasant stimulus) for bullying others in school (undesirable behavior.)
- Thieves lose their freedom (pleasant stimulus) for stealing (bad behavior.)
The use of operant conditioning is widespread. You can see it everywhere. It’s utilized by parents, teachers, companies, and the government.
Also see: Shaping Psychology
Schedules of Reinforcement Is a Key Component
Behavior modification using reinforcers and punishers requires a continuous application to remain effective. Once the reinforcement or punishment stops, the learned behavior gradually weakens and finally disappears in a process called extinction.
What is surprising, even to Skinner, is that frequency and pattern of reinforcer applications can affect how fast reinforcement works and how robust the learning remains5.
The two types of reinforcement schedules are interval-based schedules and ratio schedules.
Interval-based schedules: reinforcers are delivered after a period. The period can be fixed (fixed-interval schedule) or variable (variable-interval schedule).
Ratio-based schedules: reinforcers are delivered after a certain number of responses. The number of responses can be fixed (fixed-ratio schedule) or variable (variable ratio schedule).
Studies found that behavior learned through variable-ratio schedules is the most robust and least susceptible to extinction.
This discovery is significant because now we can use reinforcement and punishment effectively in different situations.
E.g. when using rewards to reinforce desired behavior, we now know that we should give them out only occasionally (variable-ratio schedule).
When a toddler throws a tantrum in the market, we now know we cannot give in to buying candies no matter what. Occasionally giving in will make the habit much harder to break.
Final Thoughts on Operant Conditioning
Operant conditioning is something we often see around us. Sometimes we do it intentionally but sometimes not. Recognizing the pros and cons of this type of behavior modification can help us avoid pitfalls and reach the best results.
- 1.Staddon JER, Cerutti DT. Operant Conditioning. Annu Rev Psychol. February 2003:115-144. doi:10.1146/annurev.psych.54.101601.145124
- 2.Staats AW, Staats CK. Attitudes established by classical conditioning. The Journal of Abnormal and Social Psychology. 1958:37-40. doi:10.1037/h0042782
- 3.Thorndike EL. The Law of Effect. The American Journal of Psychology. December 1927:212. doi:10.2307/1415413
- 4.Duncan IJH, Horne AR, Hughes BO, Wood-Gush DGM. The pattern of food intake in female Brown Leghorn fowls as recorded in a Skinner box. Animal Behaviour. May 1970:245-255. doi:10.1016/s0003-3472(70)80034-3
- 5.Ferster CB, Skinner BF. Schedules of Reinforcement. Appleton-Century-Crofts; 1957. doi:10.1037/10627-000