Behavior Learning and Individual Cooperation in Autonomous Agents by genetic Programming

context of the project

This project was included in the “Evolutionist methods” module of my AI master and taught by Nobert Cot. Its goal was to study one of the student’s papers from John Koza’s (father of the genetic programming) courses at Stanford. I chose a paper called “Behavior Learning and Individual Cooperation in Autonomous Agents as a Result of Interaction Dynamics with the Environment” written by the student S. Kamini in 1995.

About genetic programming (GP)

Genetic programming is like genetic algorithm except that the solution is not a value but a program. It is a stochastic method to search an optimal program in a search space of program candidates represented by trees. A tree is composed of nodes which are functions with arguments (the number of arguments = the number of child nodes) and leafs which are functions with no arguments (could be a constant).

This method is based upon the Darwin theory of evolution: the animals that are the most adapted to their environment are more chance to couple. The method has several phases. At the beginning we randomly generate several programs (a population of 1000 in general). Then we evaluate these programs. To know whether a program is efficient or not, we evaluate its performance with a function called “fitness” that takes as argument our program and returns a performance value. This function is defined according criteria we want optimize to resolve the problem. The more the criteria are closed to their optimal, more the fitness’ value decreases. Then we select the best programs among the population and generate new programs (i.e trees) coming from the reproduction of several pairs of trees. The reproduction use genetic techniques like mutation and hybridization (crossing over) at level of the node. That why we call this kind of search method “genetic”. For more information , read the tutorial [2].

The article: “Behavior Learning and Individual Cooperation in Autonomous Agents as a Result of Interaction Dynamics with the Environment”

What is Behavior Learning ?

The goal of the paper is to remake the experience of Luc Steels in a genetic programming context. L. Steels experience was built to illustrated an alternative approach about action planning. In his article [1], he presents an new kind of action planning based upon the fact that behavior is the result from the interaction dynamics between the agent and the environment. From this fact he uses new methods opposed to the main approach: a behavior-oriented instead of goal-oriented design, a cooperative architecture instead of the subsumption architecture and the parallelism opposed to the action selection mechanics. Thus the robot’s behavior system is composed of 6 programmed behaviors(behavior-oriented design). A each instant, the robot’s behavior is calculated as a “sum” of all these behaviors ( cooperative architecture). But each programmed behavior doesn’t have the same influence in the final calculus. In fact for each behavior , its influence is calculated according to the state of the environment i.e. the relation between the environment’s state and the need to the agent to execute this behavior. (NB: same kind of idea as the path planning methods using potential fields)

Finally the goal of the “Behavior Learning.. Environment” paper is to find (learn) automatically the 6 programmed behaviors using genetic programming?

what is individual cooperation ?

A individual cooperation is a cooperation where each robot thinks is alone: it does its tasks by its own. But its tasks help the achievement of the other robots’s tasks. A cooperation can genetically emerge between the robots only if it’s needful to survive for each robot. This necessity can be created by the environment. A such environment is presented in the Steels’s experience.

the experience

The environment is composed of 2 autonomous robots, 3 lamps, some obstacles and a battery loader for the robots. All the dynamics of the environment is based upon the stream of the energy as shown in the figure 2 below:

Figure 1: Simplified physical environment

Figure 2: Environment: the energy aspect

To survive a robot can recharge its battery in going on the loader. To find it , it must move toward the loader’s signal (Blue signal) using its photo-sensors. But the global energy recharging the loader is also used to charge the batteries of 3 lamps. The problem is when all the batteries of the lamps are full: the loader’s battery is automatically and completely unloaded. To avoid this critical situation for the survival of the robots, they have to seek the light of the lamps (yellow signal) with others photo-sensors and knock against the lamps to decrease their internal batteries. Remark:as the lamp’s battery will decrease, its light will be lower. After several hits, the robot’s photo-sensors will be attracted by another lamp due to the very low brightness of the light. The experience is set up in such way that only one robot can’t survive by its own. They need to be two or more and cooperate: while an agent is recharging , the others is seeking a lamp to knock it to dim the light.

As we’ve already told, Luc steels programs the robots in a behavior-oriented design. several behaviors are programmed:

  1. move straight forward: accelerate the 2 wheels speeds until a given speed.
  2. Touch-based obstacle avoidance: if an forward obstacle blocks the robot, inverse the wheels speed.
  3. Attraction to the charging station: influence the direction of the robot (i.e the difference of the 2 wheels speed) in function of the direction of the signal and its battery
  4. seek out lamps: influence the direction of the robot (i.e the difference of the 2 wheels speed) in function of the direction of the light and its battery.
  5. Halt while recharging: the robot’s speed is null if the battery is recharging.

If we apply all of these unit-behaviors parallelly, we obtain the final behavior of the robots.

Finding the ideal program reacting the same way to the behavior-oriented program of the L. Steels’ experience

I present here the operators of a generated program for our experience.The operators are divided in 2 sections:

  • the conditional functions: (IF (condition) (execute A) (else execute B) ) which represent the decision functions.
  • the terminals which represent the robot’s effectors (forwarding, turning ,etc..).
Set of conditional functions (Sensors) taking 2 arguments: the first argument is executed if the condition is true else it’s the second.
  • IFMBA(a, b) : « IF Max Blue intensity Ahead »: if the robot is positioned in front of the loader’s signal (i.e not need to turn).
  • IFMYA(a, b): « IF Max Yellow intensity Ahead »:if the robot is positioned in front of the most brightening light (i.e not need to turn).
  • IFAMB(a, b): « IF At Max Blue intensity»: if the robot has reached the most brightening light.
  • IFAMY(a, b): « IF At Max Yellow intensity»:if the robot has reached the loader’s signal.
  • IFSAFE(a, b): IF its battery is full
  • IFDIE(a, b): IF its battery is low
  • IFOBB(a, b): « if obstacle back »: if an obstacle is behind it
  • IFOBF(a, b): « if obstacle Front »: if an obstacle is in front of it
Set of terminals (effectors)
  • FOR: forward
  • BACK:backward
  • TURN: turn to the left (90°)
  • HALT: stop
An example of a generated program:

Explanation of the behavior of this part of the program: if the robot
is dying then if it is at the charging station (IFAMB), it stops to move to recharge its battery (HALT) else it tries to locate the charging station (IFMBA) and move toward it (FOR and TURN effectors). Remark: in this short example we retrieve the programmed behaviors 3 and 5 from above.

The fitness

So to evaluate a program, we use a function called “The fitness”. this function takes in argument a program and returns a value corresponding of its performance. The higher the fitness is, the better the program is. To evaluate the program, we will load the program into the two robots and start the simulation of the environment. The simulation ends when one of the robots is died (no enough battery). Then we compute and return a ponderate sum mixing :

  • the age of the system (how many cycles did the two robots stay alive ?)
  • the number of lamps with a full battery (we prefer a system in which there is any lamps with a full battery, sign of a good distribution of knocking)
  • the complexity of the evaluated program (i.e number of operators)

the results

After spent a lot of time to try to find the right parameters for the dynamics of the system, for the effectors and sensors, for the fitness calculus, for the performance of the genetic programming algorithm,etc.. (main problem for this sort of algorithm) I finally found a good solution with an initial population of 10000 randomly generated programs and after 60 generations.


  • (IFMYA (IFOBA BACK FOWARD) LEFT) : moving to the light and knocking it (= avoidance of an obstacle + attraction for the light):
  • (IFAMB HALT (IFMBA FOWARD (IFDIE LEFT (IFOBA BACK FOWARD))))): halting if it is on the loader else it goes to it even if there is an obstacle in front of the robot (like a potential robot recharging its battery on the loader)

An unexpected behavior: the radar

I’ve forgotten to mention that in the Steel’s experience, a communication system is integrated in the robots to communicate simply: When a robot begins to not have enough energy , it sends a radio signal to the second robot. if this one is recharging it will be able to move away from the loader to let a chance to the other to recharge its battery. At the beginning of the simulation, I didn’t implement this communication system. after some generations, a strange behavior automatically emerged: when the robot was recharging , it turned around itself. Actually it was to detect if the other robots didn’t want to recharge their batteries. Turning and checking its obstacle bumper allows to detect an obstacle (a robot) in every directions like a radar. So when another robot went to the loader to recharge its battery, the first robot that was recharging could detect it and move away from the loader. So this emerged behavior replaced the communication system. funny isn’t it.