Enhancing the learning capabilities of machines

Our detailed probing of the specific motor control and neural circuit in the Pavlovian behavioural paradigm will be complemented by a more wide-ranging evaluation of the complete capacity of the larvae, and its reproduction in a real-world robotic device. How does operant learning occur in the larvae, and how does it interact with Pavlovian learning? If the larva is able to learn about ‘value’ of an odour, can it use that information to control other behaviours, including second-order conditioning?
Can we demonstrate that it genuinely predicts the nature of reinforcement through devaluation experiments? Note that these questions lie at the heart of current learning theory: if the larvae does not show these capabilities then it cuts at the generality of these accounts, and suggests there are more fundamental forms of learning common to all animals (WP2 Objective 2.1 – To test larval learning in associative learning paradigms that probe the relation to current computational models).. If the larvae does display these abilities (and evidence for each of them in adult flies or other insects already exists) then we need to understand how they are supported in a miniature brain.
A second important perspective needed for technological breakthrough is to understand how the larvae can learn in more natural situations with multiple stimuli and noisy or scattered rewards. This enriched description of larval learning will be replicated in the neural-agent model (WP2 Objective 2.2 – To characterise learning in more complex contexts and behavioural environments). What is the minimal computational substrate for rich learning capacities? There is a range of possible issues to explore where we already know the biological system differs from most computational learning algorithms. Not all synaptic change related to learning is Hebbian in nature, and learning may be supported by both pre-synaptic changes and changes in post-synaptic neuronal excitability – intriguingly in Aplysia this relates to classical vs. operant conditioning [14]. Biological memory has multiple time-scales – in Drosophila there are genetically separable short, medium and long term phases – what functional role does this play? Is there more than one location within the control architecture where learning alters connectivity? Aminergic systems appear to play important roles in motivation and action selection as well as reinforcement – how do these interact? Are internal prediction mechanisms necessary to explain behaviour even in miniature brains? By understanding the crucial features at the circuit level we then aim to abstract to efficient algorithms. We expect this will produce some radically new principles, possibly less general but substantially more effective for the kinds of problems faced by real-world devices and organisms. We can use methods of system identification, circuit simplification, and dynamical systems analysis – exploring these methods for this circuit will help in development of general methodological tools that can be applied to other brains. Finally it may be the case that any simplification reduces capacity, and we should look toward hardware substrates able to emulate directly the neural processing. (WP 4 Objective 4.3 – To derive a minimal algorithmic description of the circuit that preserves the learning capability, and demonstrate its effectiveness for a robot learning in a natural environment). Using a robot implementation is an important validation step for biological models as it forces consideration of the problems posed by interaction with real world physics and real time constraints that animals have solved, and tests algorithms in a continuous action context. Moreover we are interested in how the apparently simple problem of associating stimuli plays out in situations where these stimuli appear in multimodal gradients (or more noisy distributions) in which the agent is continuously orienting. There are at least three application domains where robots with adaptive abilities to discover gradients signalling salient cues would be immediately useful: mineral exploration (exploiting ‘path-finder’ elements as cues to valuable resources), including interplanetary exploration; environmental survey and monitoring, particularly crucial in situations of pollution, or radiation e.g. for hot-spot mapping; precision agriculture in which highly localised variation of crop treatment to minimise resource and maximise yield is dependent on understanding the spatial correlations of many interacting factors. In each domain we can envisage minimalist devices capable of intelligent sampling and navigation strategies guided by the spatial change in the phenomena. For this project we will focus on the precision agriculture domain, and in consultation with appropriate stakeholders, define a realistic and challenging robot task.

Associated Milestones:
5 – Paradigm for operant learning (month 13)
8 – Results from advanced Pavlovian paradigms (month 19)
9 – Algorithm(s) based on behavioural results (month 19)
12 – Robot prototype (month 25)
13 – Algorithm(s) derived from neural circuit (month 31)
14 – Behavioural results for multimodal learning (month 31)
16- Robot behaviour compared to larvae in complex environment (month 36)

This entry was posted in Objectives, Project. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *