Operant conditioning, also known as instrumental conditioning, is a learning process in which behavior is modified using rewards or punishments. By repeatedly pairing the desired behavior with a consequence, an association is formed to create new learning.
E.g. a dog trainer gives a dog a treat every time the dog raises its left paw. The dog learns that raising its left paw can earn him food.
Classical Conditioning as the Foundation of Behaviorism
We can trace back the origin of operant conditioning to its predecessor, classical conditioning.
Classical conditioning, also known as Pavlovian conditioning, also involves learning a new behavior through the process of association.2
Russian physiologist Ivan Pavlov first experimented with classical conditioning in the late 1800s. He noticed that his dogs salivated whenever he entered the room to feed them. In his experiments, Pavlov rang a bell every time he fed his dogs. Over time, the dogs became conditioned to salivate when they heard the bell’s sound, even when food wasn’t present.
Food, which was able to trigger salivation naturally, is the unconditioned stimulus. The bell’s sound, which began to trigger salivation after being paired with food, was the conditioned stimulus.
When the unconditioned stimulus (food) and the conditioned stimulus (sound) became associated, the conditioned stimulus could trigger the same response. This newly learned response became a conditioned response. This is a form of learning by association.
Pavlovian conditioning became the foundation of Behaviorism, a leading field within the study of psychology at the time. Behaviorists believe that behavior is a response to external stimuli, and humans only learn by association, not by thoughts, feelings, or inner mental events.
Law of Effect & Operant Conditioning
Later, psychologist Edward Thorndike came up with the concept of instrumental conditioning when he observed the impact of reinforcement in puzzle box experiments with cats trying to escape. He called this process “trial-and-error” learning.
Thorndike proposed the Law of Effect3, which stated that if, in the presence of a stimulus, a response was followed by a satisfying event (reinforcer), the bond between stimulus and response was strengthened. Conversely, if a response-stimulus event was followed by an unsatisfying event (punisher), the bond was weakened.
In the early 1900s, psychologist B.F. Skinner built on the concepts of reinforcer and punisher to create the theory of operant conditioning.
Skinner believed that Pavlovian conditioning was far too simple to explain complex human behavior thoroughly. He believed that the best way to understand behavior was to observe its causes and consequences1.
In Skinner’s operant conditioning paradigm, behavior can be manipulated when it is followed by reinforcement or punishment.
Unlike classical conditioning, which involves unconscious reflexive behavior, operant behaviors are behaviors under conscious control. Applying reinforcement and punishment creates a deliberate and conscious learning process.
To study operant conditioning, B.F. Skinner made a chamber, called the Skinner Box, and put a small animal inside. In the experiments, each time the animal pressed a lever or a bar, it received food or water as reinforcement4.
Through his experiments, Skinner distinguished two types of consequences that could affect new learning: reinforcement and punishment.
Reinforcement increases target behavior, while punishment decreases it.
Reinforcement has two forms: positive reinforcement and negative reinforcement.
In psychology, positive refers to adding a stimulus and negative removing one.
Positive reinforcement adds a rewarding consequence to behavior, therefore strengthening or increasing the likelihood that the desired behavior will appear again.
- A parent gives their child an extra allowance (reinforcer) for doing the dishes (desired behavior).
- A manager offers bonuses (reinforcer) to their workers for finishing the project on time (desired behavior).
- A teacher gives students gold stars (reinforcer) for raising their hands before they speak (desired behavior).
Negative reinforcement removes an unpleasant stimulus to increase the desired behavior in the future.
- A child doesn’t have to clean the table (unpleasant event) after the meal if they eat their vegetable (desired behavior).
- Taking out the garbage (desired behavior) removes rotten smell (unpleasant stimulus) in the kitchen.
- Brushing the teeth (desired behavior) prevents tooth decay (unpleasant event).
Like reinforcement, punishment also comes in two forms: positive punishment and negative punishment.
Positive punishment adds an unpleasant stimulus to weaken or eliminate a behavior. Positive punishment is usually what we refer to as “punishment” in our everyday lives.
- A parent scolds their children for using inappropriate language
- Teacher gives a student time out for disturbing the class
- A dog gets a treat from the trainer for not jumping on people
Negative punishment removes a pleasant stimulus to stop undesired behavior.
- A parent takes away their child’s phone for refusing to do homework
- Police issues the driver a speeding ticket for speeding
- A worker loses their lunch break for getting to work late
The use of operant conditioning is widespread. You can see it everywhere. It’s utilized by parents, teachers, companies, and government.
Schedules of Reinforcement Is a Key Component
Behavior modification using reinforcers and punishers requires a continuous application to remain effective. Once the reinforcement or punishment stops, the learned behavior gradually weakens and finally disappears in a process called extinction.
What is surprising, even to Skinner, is that frequency and pattern of reinforcer applications can affect how fast reinforcement works and how robust the learning remains5.
The two types of reinforcement schedules are interval-based schedules and ratio schedules.
Interval-based schedules: reinforcers are delivered after a period. The period can be fixed (fixed-interval schedule) or variable (variable-interval schedule).
Ratio-based schedules: reinforcers are delivered after a certain number of responses. The number of responses can be fixed (fixed-ratio schedule) or variable (variable-ratio schedule).
Studies found that behavior learned through variable-ratio schedules is the most robust and least susceptible to extinction.
This discovery is significant because now we can use reinforcement and punishment effectively in different situations.
E.g. when using rewards to reinforce desired behavior, we now know that we should give them out only occasionally (variable-ratio schedule).
When a toddler throws a tantrum in the market, we now know we cannot give in to buying candies no matter what. Occasionally giving in will make the habit much harder to break.
Final Thoughts on Operant Conditioning
Operant conditioning is something we often see around us. Sometimes we do it intentionally but sometimes not. Recognizing the pros and cons of this type of behavior modification can help us avoid pitfalls and reach the best results.
- 1.Staddon JER, Cerutti DT. Operant Conditioning. Annu Rev Psychol. February 2003:115-144. doi:10.1146/annurev.psych.54.101601.145124
- 2.Staats AW, Staats CK. Attitudes established by classical conditioning. The Journal of Abnormal and Social Psychology. 1958:37-40. doi:10.1037/h0042782
- 3.Thorndike EL. The Law of Effect. The American Journal of Psychology. December 1927:212. doi:10.2307/1415413
- 4.Duncan IJH, Horne AR, Hughes BO, Wood-Gush DGM. The pattern of food intake in female Brown Leghorn fowls as recorded in a Skinner box. Animal Behaviour. May 1970:245-255. doi:10.1016/s0003-3472(70)80034-3
- 5.Ferster CB, Skinner BF. Schedules of Reinforcement. Appleton-Century-Crofts; 1957. doi:10.1037/10627-000