B. Gao and M. W. Spratling (2025) Filter competition results in more robust convolutional neural networks. Neurocomputing, 617:128972.
CodeAbstract
Convolutional layers, one of the basic building blocks of deep learning architectures, contain numerous trainable filters for feature extraction. These filters operate independently which can result in distinct filters learning similar weights and extracting similar features. In contrast, competition mechanisms in the brain contribute to the sharpening of the responses of activated neurons, enhancing the contrast and selectivity of individual neurons towards specific stimuli, and simultaneously increasing the diversity of responses across the population of neurons. Inspired by this observation, this paper proposes a novel convolutional layer based on the theory of predictive coding, in which each filter effectively tries to block other filters from responding to the input features which it represents. In this way, filters learn to become more distinct which increases the diversity of the extracted features. When replacing standard convolutional layers with the proposed layers the performance of classification networks is not only improved on ImageNet but also significantly boosted on eight robustness benchmarks, as well as on downstream detection and segmentation tasks. Most notably, ResNet50/101/152 robust accuracy increases by 15.9%/20.0%/20.9% under FGSM attack, and by 10.5%/14.7%/15.0% under PGD attack.
M. Fontana, M. W. Spratling and M. Shi (2025) Optimizing dense visual predictions through multi-task coherence and
prioritization. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), in press.
PDFCodeAbstract
Multi-Task Learning (MTL) involves the concurrent training of multiple tasks, offering notable advantages for
dense prediction tasks in computer vision. MTL not only reduces training and inference time as opposed to having multiple single-task models but also enhances task accuracy through the interaction of multiple task. However, existing methods face limitations. They often rely on sub-optimal cross-task interactions, resulting in task-specific predictions with poor geometric and predictive coherence. Additionally, many approaches utilize inadequate loss weighting strategies, which fail to address the inherent variability in task evolution during training. To overcome these challenges, we propose an advanced MTL model specifically designed for dense vision tasks. Our model leverages the state-of-the-art vision transformer with task-speicifc decoders. To enhance cross-task coherence, we introduce a trace-back method that improves both cross-task geometric and predictive features. Furthermore, we present a novel dynamic task balancing approach that projects task losses onto a common scale and prioritizes more challenging tasks during training. Extensive experiments demonstrate the superiority of our method, establishing new state-of-the-art performance across two benchmark datasets.
W. A. Phillips, T. Bachmann, M. W. Spratling, L. Muckli, L. S. Petro and T. Zolnik (2025) Cellular psychology: relating cognition to context-sensitive pyramidal cells. Trends in Cognitive Sciences, 29(1): 28-40.
PDFAbstract
`Cellular psychology' is a new field of inquiry that studies dendritic mechanisms for adapting mental events to the current context, thus increasing their coherence, flexibility, effectiveness, and comprehensibility. Apical dendrites of neocortical pyramidal cells have a crucial role in cognition - those dendrites receive input from diverse sources, including feedback, and can amplify the cell's feedforward transmission if relevant in that context. Specialized subsets of inhibitory interneuron regulate this cooperative context-sensitive processing by increasing or decreasing amplification. Apical input has different effects on cellular output depending on whether we are awake, deeply asleep, or dreaming. Furthermore, wakeful thought and imagery may depend on apical input. High-resolution neuroimaging in humans supports and complements evidence on these cellular mechanisms from other mammals.
C. Huang, H. Guan, A. Jiang, Y. Wang, M. Spratling, X. Wang, Y. Zhang (2024) Few-shot anomaly detection via category-agnostic registration learning. IEEE Transactions on Neural Networks and Learning Systems, in press.
PDFCodeAbstract
Most existing anomaly detection methods require a dedicated model for each category. Such a paradigm, despite its promising results, is computationally expensive and inefficient, thereby failing to meet the requirements for real-world applications. Inspired by how humans detect anomalies, by comparing a query image to known normal ones, this paper proposes a novel few-shot anomaly detection (FSAD) framework. Using a training set of normal images from various categories, registration, aiming to align normal images of the same categories, is leveraged as the proxy task for self-supervised category-agnostic representation learning. At test time, an image and its corresponding support set, consisting of a few normal images from the same category, are supplied, and anomalies are identified by comparing the registered features of the test image to its corresponding
support image features. Such a setup enables the model to generalize to novel test categories. It is, to our best knowledge, the first FSAD method that requires no model fine-tuning for novel categories: enabling a single model to be applied to all categories. Extensive experiments demonstrate the effectiveness of the proposed method. Particularly, it improves the current state-of-the-art for FSAD by 11.3% and 8.3% on the MVTec and MPDD benchmarks, respectively.
L. Li, J. Qiu and M. W. Spratling (2024) AROID: improving adversarial robustness through online instance-wise data augmentation. International Journal of Computer Vision (IJCV), in press.
PDFCodeAbstract
Deep neural networks are vulnerable to adversarial examples. Adversarial training (AT) is an effective defense against adversarial examples. However, AT is prone to overfitting which degrades robustness substantially. Recently, data augmentation (DA) was shown to be effective in mitigating robust overfitting if appropriately designed and optimized for AT. This work proposes a new method to automatically learn online, instance-wise, DA policies to improve robust generalization for AT. A novel policy learning objective, consisting of Vulnerability, Affinity and Diversity, is proposed and shown to be sufficiently effective and efficient to be practical for automatic DA generation during AT. This allows our method to efficiently explore a large search space for a more effective DA policy and evolve the policy as training progresses. Empirically, our method is shown to outperform or match all competitive DA methods across various model architectures (CNNs and ViTs) and datasets (CIFAR10, SVHN and Imagenette). Our DA policy reinforced vanilla AT to surpass several state-of-the-art AT methods (with baseline DA) in terms of both accuracy and robustness. It can also be combined with those advanced AT methods to produce a further boost in robustness.
J. Ning, M. W. Spratling and L. Gionfrida (2024) Improving the accuracy of tiny object detection by negative sample copy-paste. Proceedings of the 31st International Conference on Neural Information Processing (ICONIP).
Abstract
Detecting tiny objects is an essential task in the field of computer vision but poses a considerable challenge for existing detectors. One issue is that task-irrelevant objects or non-object background patches can be mistakenly detected as objects of interest, which significantly impairs detector precision. To tackle this issue we include an online image augmentation technique, NegCopyPaste, in the training process. This method copies regions of training images that have been falsely identified as target objects in one epoch and pastes them into the training images to be used in the next epoch. By training the model to reject false-positive predictions made in previous epochs, the proposed method effectively decreases the proportion of false-positive predictions compared to the baselines, making the network more selective in picking out the target objects. NegCopyPaste reduces the number of false-positive predictions during inference and achieves new state-of-the-art results on TinyPerson, WiderFace and DOTA, notably improving mAPtiny by 1.58% over the previous best method on TinyPerson.
M. Fontana, M. W. Spratling and M. Shi (2024) When multi-task learning meets partial
supervision: a computer vision review. Proceedings of the IEEE, 112(6): 516-543.
PDFAbstract
Multi-Task Learning (MTL) aims to learn multiple
tasks simultaneously while exploiting their mutual relationships.
By using shared resources to simultaneously calculate multiple
outputs, this learning paradigm has the potential to have lower
memory requirements and inference times compared to the
traditional approach of using separate methods for each task.
Previous work in MTL has mainly focused on fully-supervised
methods, as task relationships can not only be leveraged to
lower the level of data-dependency of those methods but they
can also improve performance. However, MTL introduces a
set of challenges due to a complex optimisation scheme and a
higher labeling requirement. This review focuses on how MTL
could be utilised under different partial supervision settings
to address these challenges. First, this review analyses how
MTL traditionally uses different parameter sharing techniques
to transfer knowledge in between tasks. Second, it presents
the different challenges arising from such a multi-objective
optimisation scheme. Third, it introduces how task groupings
can be achieved by analysing task relationships. Fourth, it
focuses on how partially supervised methods applied to MTL can
tackle the aforementioned challenges. Lastly, this review presents
the available datasets, tools and benchmarking results of such
methods.
N. Manchev and M. W. Spratling (2024) Learning multi-modal recurrent neural networks with target propagation. Computational Intelligence, 40(4):e12691.
PDFCodeAbstract
Modelling one-to-many type mappings in problems with a temporal component can be challenging. Backpropagation is not applicable to networks that perform discrete sampling and is also susceptible to gradient instabilities, especially when applied to longer sequences. In this paper we propose two recurrent neural network architectures that leverage stochastic units and mixture models, and are trained with target propagation. We demonstrate that these networks can model complex conditional probability distributions, outperform backpropagation-trained alternatives, and do not rapidly degrade with increased time horizons. Our main contributions consist of the design and evaluation of the architectures that enable the networks to solve multi-model problems with a temporal dimension. This also includes the extension of the target propagation through time algorithm to handle stochastic neurons. The use of target propagation provides an additional computational advantage, which enables the network to handle time horizons that are substantially longer compared to networks fitted using backpropagation.
L. Li, Y. Wang, C. Sitawarin and M. W. Spratling (2024) OODRobustBench: a benchmark and large-scale analysis of adversarial robustness under distribution shift. Proceedings of the International Conference on Machine Learning (ICML).
PDFCodeAbstract
Existing works have made great progress in improving adversarial robustness, but typically test their method only on data from the same distribution as the training data, i.e. in-distribution (ID) testing. As a result, it is unclear how such robustness generalizes under input distribution shifts, i.e. out-of-distribution (OOD) testing. This is a concerning omission as such distribution shifts are unavoidable when methods are deployed in the wild. To address this issue we propose a benchmark named OODRobustBench to comprehensively assess OOD adversarial robustness using 23 dataset-wise shifts (i.e. naturalistic shifts in input distribution) and 6 threat-wise shifts (i.e., unforeseen adversarial threat models). OODRobustBench is used to assess 706 robust models using 60.7K adversarial evaluations. This large-scale analysis shows that: 1) adversarial robustness suffers from a severe OOD generalization issue; 2) ID robustness correlates strongly with OOD robustness in a positive linear way. The latter enables the prediction of OOD robustness from ID robustness. We then predict and verify that existing methods are unlikely to achieve high OOD robustness. Novel methods are therefore required to achieve OOD robustness beyond our prediction. To facilitate the development of these methods, we investigate a wide range of techniques and identify several promising directions.
L. Li, H. Guan, J. Qiu and M. W. Spratling (2024) One prompt word is enough to boost adversarial robustness for pre-trained vision-language models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
PDFCodeAbstract
Large pre-trained Vision-Language Models (VLMs) like CLIP, despite having remarkable generalization ability, are highly vulnerable to adversarial examples. This work studies the adversarial robustness of VLMs from the novel perspective of the text prompt instead of the extensively studied model weights (frozen in this work). We first show that the effectiveness of both adversarial attack and defense are sensitive to the used text prompt. Inspired by this, we propose a method to improve resilience to adversarial attacks by learning a robust text prompt for VLMs. The proposed method, named Adversarial Prompt Tuning (APT), is effective while being both computationally and data efficient. Extensive experiments are conducted across 15 datasets and 4 data sparsity schemes (from 1-shot to full training data settings) to show APT's superiority over hand-engineered prompts and other state-of-the-art adaption methods. APT demonstrated excellent abilities in terms of the in-distribution performance and the generalization under input distribution shift and across datasets. Surprisingly, by simply adding one learned word to the prompts, APT can significantly boost the accuracy and robustness () over the hand-engineered prompts by +13% and +8.5% on average respectively. The improvement further increases, in our most effective setting, to +26.4% for accuracy and +16.7% for robustness.
S. N. Eddine, T. Brothers, L. Wang, M. Spratling, and G. R. Kuperberg (2024) A predictive coding model of the N400. Cognition, 246: 105755.
PDFAbstract
The N400 event-related component has been widely used to investigate the neural
mechanisms underlying real-time language comprehension. However, despite decades of
research, there is still no unifying theory that can explain both its temporal dynamics and
functional properties. In this work, we show that predictive coding - a biologically plausible
algorithm for approximating Bayesian inference - offers a promising framework for
characterizing the N400. Using an implemented predictive coding computational model, we
demonstrate how the N400 can be formalized as the lexico-semantic prediction error produced as
the brain infers meaning from linguistic form of incoming words. We show that the magnitude of
lexico-semantic prediction error mirrors the functional sensitivity of the N400 to various lexical
variables, priming, contextual effects, as well as their higher-order interactions. We further show
that the dynamics of the predictive coding algorithm provide a natural explanation for the
temporal dynamics of the N400, and a biologically plausible link to neural activity. Together,
these findings directly situate the N400 within the broader context of predictive coding research,
and suggest that the brain may use the same computational mechanism for inference across
linguistic and non-linguistic domains.
H. Guan and M. W. Spratling (2024) Query semantic reconstruction for background in few-shot segmentation. The Visual Computer, 40:799-810.
PDFAbstract
Few-shot segmentation (FSS) aims to segment unseen classes using a few annotated samples.
Typically, a prototype representing the foreground class is extracted from annotated support image(s)
and is matched to features representing each pixel in the query image. However, models learnt in this
way are insufficiently discriminatory, and often produce false positives: misclassifying background
pixels as foreground. Some FSS methods try to address this issue by using the background in the
support image(s) to help identify the background in the query image. However, the backgrounds
of theses images is often quite distinct, and hence, the support image background information is
uninformative. This article proposes a method, QSR, that extracts the background from the query
image itself, and as a result is better able to discriminate between foreground and background features
in the query image. This is achieved by modifying the training process to associate prototypes with
class labels including known classes from the training data and latent classes representing unknown
background objects. This class information is then used to extract a background prototype from
the query image. To successfully associate prototypes with class labels and extract a background
prototype that is capable of predicting a mask for the background regions of the image, the machinery
for extracting and using foreground prototypes is induced to become more discriminative between
different classes. Experiments achieves state-of-the-art results for both 1-shot and 5-shot FSS on the
PASCAL-5i and COCO-20i dataset. As QSR operates only during training, results are produced with
no extra computational complexity during testing.
J. Ning and M. W. Spratling (2023) The importance of anti-aliasing in tiny object detection. Proceedings of the 15th Asian Conference on Machine Learning (ACML).
PDFCodeAbstract
Tiny object detection has gained considerable attention in the
research community owing to the frequent occurrence of tiny objects in
numerous critical real-world scenarios. However, convolutional neural
networks (CNNs) used as the backbone for object detection
architectures typically neglect Nyquist's sampling theorem during
down-sampling operations, resulting in aliasing and degraded
performance. This is likely to be a particular issue for tiny objects
that occupy very few pixels and therefore have high spatial frequency
features. This paper applied an existing approach WaveCNet for
anti-aliasing to tiny object detection. WaveCNet addresses aliasing by
replacing standard down-sampling processes in CNNs with Wavelet
Pooling (WaveletPool) layers, effectively suppressing aliasing. We
modify the original WaveCNet to apply WaveletPool in a consistent way
in the residual blocks of ResNets. Additionally, we also propose a
bottom-heavy version of the backbone, which further improves the
performance of tiny object detection while also reducing the required
number of parameters by almost a half. Experimental results on the
TinyPerson, WiderFace, and DOTA datasets demonstrate the effectiveness
of our method in detecting tiny objects: the proposed method achieves
new state-of-the-art results on all three datasets.
M. W. Spratling (2023) A comprehensive assessment benchmark for rigorously evaluating deep learning image classifiers. arXiv:2308.04137.
PDFCodeAbstract
Reliable and robust evaluation methods are a necessary first step towards developing machine learning models that are themselves robust and reliable. Unfortunately, current evaluation protocols typically used to assess classifiers fail to comprehensively evaluate performance as they tend to rely on limited types of test data, and ignore others. For example, using the standard test data fails to evaluate the predictions made by the classifier to samples from classes it was not trained on. On the other hand, testing with data containing samples from unknown classes fails to evaluate how well the classifier can predict the labels for known classes. This article advocates bench-marking performance using a wide range of different types of data and using a single metric that can be applied to all such data types to produce a consistent evaluation of performance. Using such a benchmark it is found that current deep neural networks, including those trained with methods that are believed to produce state-of-the-art robustness, are extremely vulnerable to making mistakes on certain types of data. This means that such models will be unreliable in real-world scenarios where they may encounter data from many different domains, and that they are insecure as they can easily be fooled into making the wrong decisions. It is hoped that these results will motivate the wider adoption of more comprehensive testing methods that will, in turn, lead to the development of more robust machine learning methods in the future.
M. W. Spratling (2023) Comprehensive assessment methods are key to progress in deep learning [commentary]. Behavioral and Brain Sciences, 46:e407.
L. Li and M. W. Spratling (2023) Data augmentation alone can improve adversarial training. Proceedings of the 11th International Conference on Learning Representations (ICLR).
PDFCodeAbstract
Adversarial training suffers from the issue of robust overfitting, which seriously impairs its generalization performance. Data augmentation, which is effective at preventing overfitting in standard training, has been observed by many previous works to be ineffective in mitigating overfitting in adversarial training. This work proves that, contrary to previous findings, data augmentation alone can significantly boost accuracy and robustness in adversarial training. We find that the hardness and the diversity of data augmentation are important factors in combating robust overfitting. In general, diversity can improve both accuracy and robustness, while hardness can boost robustness at the cost of accuracy within a certain limit and degrade them both over that limit. To mitigate robust overfitting, we first propose a new crop transformation Cropshift with improved diversity compared to the conventional one (Padcrop). We then propose a new data augmentation scheme, based on Cropshift, with much improved diversity and well-balanced hardness. Empirically, our augmentation method achieves the state-of-the-art accuracy and robustness for data augmentations in adversarial training. Furthermore, it matches, or even exceeds when combined with weight averaging, the performance of the best contemporary regularization methods for alleviating robust overfitting.
J. Ning, H. Guan and M. W. Spratling (2023) Rethinking the backbone architecture for tiny object detection. Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP), Volume 5, pp. 103-114.
PDFCodeAbstract
Tiny object detection has become an active area of research because images with tiny targets are common in several important real-world scenarios. However, existing tiny object detection methods use standard deep neural networks as their backbone architecture. We argue that such backbones are inappropriate for detecting tiny objects as they are designed for the classification of larger objects, and do not have the spatial resolution to identify small targets. Specifically, such backbones use max-pooling or a large stride at early stages in the architecture. This produces lower resolution feature-maps that can be efficiently processed by subsequent layers. However, such low-resolution feature-maps do not contain information that can reliably discriminate tiny objects. To solve this problem we design "bottom-heavy" versions of backbones that allocate more resources to processing higher-resolution features without introducing any additional computational burden overall. We also investigate if pre-training these backbones on images of appropriate size, using CIFAR100 and ImageNet32, can further improve performance on tiny object detection. Results on TinyPerson and WiderFace show that detectors with our proposed backbones achieve better results than the current state-of-the-art methods.
L. Li and M. W. Spratling (2023) Understanding and combating robust overfitting via input loss landscape analysis and regularization. Pattern Recognition, 136 (109229).
PDFCodeAbstract
Adversarial training is widely used to improve the robustness of deep neural networks to adversarial attack. However, adversarial training is prone to overfitting, and the cause is far from clear. This work sheds light on the mechanisms underlying overfitting through analyzing the loss landscape w.r.t. the input. We find that robust overfitting results from standard training, specifically the minimization of the clean loss, and can be mitigated by regularization of the loss gradients. Moreover, we find that robust overfitting turns severer during adversarial training partially because the gradient regularization effect of adversarial training becomes weaker due to the increase in the loss landscape's curvature. To improve robust generalization, we propose a new regularizer to smooth the loss landscape by penalizing the weighted logits variation along the adversarial direction. Our method significantly mitigates robust overfitting and achieves the highest robustness and efficiency compared to similar previous methods.
B. Gao and M. W. Spratling (2023) Explaining away results in more robust visual tracking. The Visual Computer, 39:2081-95.
PDFCodeAbstract
Many current trackers utilise an appearance model to localise the target object in each frame. However, such approaches often fail when there are similar looking distractor objects in the surrounding background, meaning that target appearance alone is insufficient for robust tracking. In contrast, humans consider the distractor objects as additional visual cues, in order to infer the position of the target. Inspired by this observation, this paper proposes a novel tracking architecture in which not only is the appearance of the tracked object, but also the appearance of the distractors detected in previous frames, taken into consideration using a form of probabilistic inference known as explaining away. This mechanism increases the robustness of tracking by making it more likely that the target appearance model is matched to the true target, rather than similar-looking regions of the current frame. The proposed method can be combined with many existing trackers. Combining it with SiamFC, DaSiamRPN, Super DiMP and ARSuper DiMP all resulted in an increase in the tracking accuracy compared to that achieved by the underlying tracker alone. When combined with Super DiMP and ARSuper DiMP the resulting trackers produce performance that is competitive with the state-of-the-art on seven popular benchmarks.
B. Gao and M. W. Spratling (2022) Shape-texture debiased training for robust template matching. Sensors, 22(17): 6658.
PDFCodeAbstract
Finding a template in a search image is an important task underlying many computer vision applications. This is typical solved by calculating a similarity map using features extracted from the separate images. Recent approaches perform template matching in a deep feature-space, produced by a convolutional neural network (CNN), which is found to provide more tolerance to changes in appearance. Inspired by these findings, in this article we investigate if enhancing the CNN's encoding of shape information can produce more distinguishable features that improve the performance of template matching. By comparing features from a same CNN but trained by different shape-texture training methods, we determined a feature-space which improves the performance of most template matching algorithms. When combining the proposed method with the Divisive Input Modulation (DIM) template matching algorithm, its performance is greatly improved, and the resulting method produces state-of-the-art results on a standard benchmark. To confirm these results we also create a new benchmark and show that the proposed method also outperforms existing techniques on this new dataset.
N. Manchev and M. W. Spratling (2022) On the biological plausibility of orthogonal initialisation for solving gradient instability in deep neural networks. Proceedings of the 9th International Conference on Soft Computing and Machine Intelligence (ISCMI), pp. 47-55.
PDFCodeAbstract
Initialising the synaptic weights of artificial neural
networks (ANNs) with orthogonal matrices is known to alleviate
vanishing and exploding gradient problems. A major objection
against such initialisation schemes is that they are deemed
biologically implausible as they mandate factorization techniques
that are difficult to attribute to a neurobiological process. This
paper presents two initialisation schemes that allow a network to
naturally evolve its weights to form orthogonal matrices, provides
theoretical analysis that pre-training orthogonalisation always
converges, and empirically confirms that the proposed schemes
outperform randomly initialised recurrent and feedforward networks.
C. Huang, H. Guan, A. Jiang, Y. Zhang, M. W. Spratling and Y.-F. Wang (2022) Registration based few-shot anomaly detection. European Conference on Computer Vision (ECCV), In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Proceedings part 24, Lecture Notes in Computer Science, Volume 13684, pp:303-19, Springer.
PDFCodeAbstract
This paper considers few-shot anomaly detection (FSAD), a practical yet under-studied setting for anomaly detection (AD), where only a limited number of normal images are provided for each category at training. So far, existing FSAD studies follow the one-model-per-category learning paradigm used for standard AD, and the inter-category commonality has not been explored. Inspired by how humans detect anomalies, i.e., comparing an image in question to normal images, we here leverage registration, an image alignment task that is inherently generalizable across categories, as the proxy task, to train a category-agnostic anomaly detection model. During testing, the anomalies are identified by comparing the registered features of the test image and its corresponding support (normal) images. As far as we know, this is the first FSAD method that trains a single generalizable model and requires no re-training or parameter fine-tuning for new categories. Experimental results have shown that the proposed method outperforms the state-of-the-art FSAD methods by 3%-8% in AUC on the MVTec and MPDD benchmarks. Source code will be publicly available.
H. Guan and M. W. Spratling (2022) CobNet: cross attention on object and background for few-shot segmentation. Proceedings of the 26th International Conference on Pattern Recognition (ICPR), pp. 39-45.
PDFAbstract
Few-shot segmentation aims to segment images containing objects from previously unseen classes using only a few annotated samples. Most current methods focus on using object information extracted, with the aid of human annotations, from support images to identify the same objects in new query images. However, background information can also be useful to distinguish objects from their surroundings. Hence, some previous methods also extract background information from the support images. In this paper, we argue that such information is of limited utility, as the background in different images can vary widely. To overcome this issue, we propose CobNet which utilises information about the background that is extracted from the query images without annotations of those images. Experiments show that our method achieves a mean Intersection-over-Union score of 61.4% and 37.8% for 1-shot segmentation on PASCAL-5i and COCO-20i respectively, outperforming previous methods. It is also shown to produce state-of-the-art performances of 53.7% for weakly-supervised few-shot segmentation, where no annotations are provided for the support images.
B. Gao and M. W. Spratling (2022) More robust object tracking via shape and motion cue integration. Signal Processing, 199 (108628).
PDFCodeAbstract
Most current trackers utilise an appearance model to localise the target object in each frame. However, such approaches often fail when there are similar looking distractor objects in the surrounding background. This paper promotes an approach that can be combined with many existing trackers to tackle this issue and improve tracking robustness. The proposed approach makes use of two additional cues to target location: making the appearance model more sensitive to shape cues in offline training, and using the historical locations of the target to predict its future position during online inference. Combining these additional mechanisms with SiamFC, SiamFC++, Super DiMP and ARSuper DiMP all resulted in an increase in the tracking accuracy compared to that achieved by the corresponding underlying tracker alone. When combined with ARSuper DiMP the resulting tracker is shown to outperform all popular state-of-the-art trackers on three benchmark datasets (OTB-100, NFS, and LaSOT), and produce performance that is competitive with the state-of-the-art on the UAV123, Trackingnet, GOT-10K and VOT2020 datasets.
B. Gao and M. W. Spratling (2021) Robust template matching via hierarchical convolutional features from a shape biased CNN. Proceedings of the International Conference on Image, Vision and Intelligent Systems (ICIVIS), Lecture Notes in Electrical Engineering, Volume 813. Springer, Singapore.
PDFCodeAbstract
Finding a template in a search image is an important task underlying many computer vision applications. Recent approaches perform template matching in a deep feature-space, produced by a convolutional neural network (CNN), which is found to provide more tolerance to changes in appearance. In this article we investigate if enhancing the CNN's encoding of shape information can produce more distinguishable features that improve the performance of template matching. This investigation results in a new template matching method that produces state-of-the-art results on a standard benchmark. To confirm these results we also create a new benchmark and show that the proposed method also outperforms existing techniques on this new dataset.
M. W. Spratling (2020) Explaining away results in accurate and tolerant template matching. Pattern Recognition, 104 (107337).
PDFCodeAbstract
Recognising and locating image patches or sets of image features is an important task underlying much work in computer vision. Traditionally this has been accomplished using template matching. However, template matching is notoriously brittle in the face of changes in appearance caused by, for example, variations in viewpoint, partial occlusion, and non-rigid deformations. This article tests a method of template matching that is more tolerant to such changes in appearance and that can, therefore, more accurately identify image patches. In traditional template matching the comparison between a template and the image is independent of the other templates. In contrast, the method advocated here takes into account the evidence provided by the image for the template at each location and the full range of alternative explanations represented by the same template at other locations and by other templates. Specifically, the proposed method of template matching is performed using a form of probabilistic inference known as "explaining away". The algorithm used to implement explaining away has previously been used to simulate several neurobiological mechanisms, and been applied to image contour detection and pattern recognition tasks. Here it is applied for the first time to image patch matching, and is shown to produce superior results in comparison to the current state-of-the-art methods.
N. Manchev and M. W. Spratling (2020) Target propagation in recurrent neural networks. Journal of Machine Learning Research, 21(7): 1-33.
PDFCodeAbstract
Recurrent Neural Networks have been widely used to process sequence data, but have
long been criticized for their biological implausibility and training difficulties related to
vanishing and exploding gradients. This paper presents a novel algorithm for training
recurrent networks, target propagation through time (TPTT), that outperforms standard
backpropagation through time (BPTT) on four out of the five problems used for testing.
The proposed algorithm is initially tested and compared to BPTT on four synthetic time
lag tasks, and its performance is also measured using the sequential MNIST data set. In
addition, as TPTT uses target propagation, it allows for discrete nonlinearities and could
potentially mitigate the credit assignment problem in more complex recurrent architectures.
M. W. Spratling (2019) Fitting predictive coding to the neurophysiological data. Brain Research, 1720: 146313.
PDFCodeAbstract
Recent neurophysiological data showing the effects of locomotion on neural
activity in mouse primary visual cortex has been interpreted as providing strong
support for the predictive coding account of cortical function. Specifically,
this work has been interpreted as providing direct evidence that
prediction-error, a distinguishing property of predictive coding, is encoded in
cortex. This article evaluates these claims and highlights some of the
discrepancies between the proposed predictive coding model and the
neuro-biology. Furthermore, it is shown that the model can be modified so as to
fit the empirical data more successfully.
I. E. Kartoglu and M. W. Spratling (2018) Two collaborative filtering recommender systems based on sparse dictionary coding. Knowledge and Information Systems, 57(3): 709-20.
PDFCodeAbstract
This paper proposes two types of recommender systems based on sparse dictionary
coding. Firstly, a novel predictive recommender system that attempts to predict
a user's future rating of a specific item. Secondly, a top-n recommender system
which finds a list of items predicted to be most relevant for a given user. The
proposed methods are assessed using a variety of different metrics and are shown
to be competitive with existing collaborative filtering recommender
systems. Specifically, the sparse dictionary-based predictive recommender has
advantages over existing methods in terms of a lower computational cost and not
requiring parameter tuning. The sparse dictionary-based top-n recommender system
has advantages over existing methods in terms of the accuracy of the predictions
it makes and not requiring parameter tuning. An open-source software implemented
and used for the evaluation in this paper is also provided for reproducibility.
Q. Wang and M. W. Spratling (2018) Contour detection refined by a sparse reconstruction-based discrimination method. Signal, Image and Video Processing, 12(2): 207-14.
PDFAbstract
Sparse representations have been widely used for many image processing tasks. In
this paper, a sparse reconstruction-based discrimination (SRBD) method, which
was previously proposed for the classification of image patches, is utilized to
improve boundary detection in colour images. This method is applied to refining
the results generated by three different algorithms: a biologically-inspired
method, and two state-of-the-art algorithms for contour detection. All of the
contour detection results are evaluated by the BSDS300 and BSDS500 benchmarks
using the quantitative measures: F-score, ODS, OIS and AP. Evaluation results
show that the performance of each algorithm is improved using the proposed
method of refinement with at least one of the quantitative measures increased by
0.01. In particularly, even two state-of-the-art algorithms are slightly
improved by applying the SRBD method to refine their contour detection results.
M. W. Spratling (2017) A predictive coding model of gaze shifts and the
underlying neurophysiology. Visual Cognition, 25(7-8): 770-801.
PDFCodeAbstract
A comprehensive model of
gaze control must account for a number of empirical observations at both the
behavioural and neurophysiological levels. The computational model presented in
this article can simulate the coordinated movements of the eye, head, and body
required to perform horizontal gaze shifts. In doing so it reproduces the
predictable relationships between the movements performed by these different
degrees of freedom (DOFs) in the primate. The model also accounts for the
saccadic undershoot that accompanies large gaze shifts in the biological visual
system. It can also account for our perception of a stable external world
despite frequent gaze shifts and the ability to perform accurate memory-guided
and double-step saccades. The proposed model also simulates peri-saccadic
compression: the mis-localisation of a briefly presented visual stimulus towards
the location that is the target for a saccade. At the neurophysiological level,
the proposed model is consistent with the existence of cortical neurons tuned to
the retinal, head-centred, body-centred, and world-centred locations of visual
stimuli and cortical neurons that have gain-modulated responses to visual
stimuli. Finally, the model also successfully accounts for peri-saccadic
receptive field (RF) remapping which results in reduced responses to stimuli in
the current RF location and an increased sensitivity to stimuli appearing at the
location that will be occupied by the RF after the saccade. The proposed model
thus offers a unified explanation for this seemingly diverse range of
phenomena. Furthermore, as the proposed model is an implementation of the
predictive coding theory, it offers a single computational explanation for these
phenomena and relates gaze shifts to a wider framework for understanding
cortical function.
W. Muhammad and M. W. Spratling (2017) A neural model for eye-head-arm coordination. Advanced Robotics, 31(12): 650-63.
PDFAbstract
The coordinated movement of the eyes, the head and
the arm is an important ability in both animals and humanoid
robots. To achieve this the brain and the robot control system
need to be able to perform complex non-linear sensory-motor
transformations in the forward and inverse directions between
many degrees of freedom. In this article we apply an omni-
directional basis function neural network to this task. The
proposed network can perform 3-D coordinated gaze shifts and
3-D arm reach movements to a visual target. Particularly, it can
perform direct sensory-motor transformations to shift gaze and
to execute arm reach movements and can also perform inverse
sensory-motor transformations in order to shift gaze to view the
hand.
M. W. Spratling (2017) A hierarchical predictive coding model of object recognition in natural images. Cognitive Computation, 9(2): 151-67.
PDFCodeAbstract
Predictive coding has been proposed as a model of the hierarchical
perceptual inference process performed in the cortex. However, results
demonstrating that predictive coding is capable of performing the complex
inference required to recognise objects in natural images have not
previously been presented.
This article proposes a hierarchical neural network based on predictive
coding for performing visual object recognition.
This network is applied to the tasks of categorising hand-written digits,
identifying faces, and locating cars in images of street scenes. It is shown
that image recognition can be performed with tolerance to position,
illumination, size, partial occlusion and within-category variation.
The current results, therefore, provide the first practical demonstration
that predictive coding (at least the particular implementation of predictive
coding used here; the PC/BC-DIM) is capable of performing accurate
visual object recognition.
D. Re, A. Gibaldi, S. P. Sabatini and M. W. Spratling (2017) An integrated system based on binocular learned receptive fields for saccade-vergence on visually salient targets. Proceedings of the 12th International Conference on Computer Vision Theory and Applications (VISAPP), Volume 6, pp:204-15.
PDFAbstract
The human visual system uses saccadic and vergence eyes movements to foveate interesting objects with both eyes, and thus exploring the visual scene. To mimic this biological behavior in active vision, we proposed a bio-inspired integrated system able to learn a functional sensory representation of the environment, together with the motor commands for binocular eye coordination, directly by interacting with the environment itself. The proposed architecture, rather than sequentially combining different functionalities, is a robust integration of different modules that rely on a front-end of learned binocular receptive fields to specialize on different sub-tasks. The resulting modular architecture is able to detect salient targets in the scene and perform precise binocular saccadic and vergence movement on it. The performances of the proposed approach has been tested on the iCub Simulator, providing a quantitative evaluation of the computational potentiality of the learned sensory and motor resources.
M. W. Spratling (2017) A review of predictive coding algorithms. Brain and Cognition, 112: 92-7.
PDFAbstract
Predictive coding is a
leading theory of how the brain performs probabilistic inference. However, there
are a number of distinct algorithms which are described by the term "predictive
coding". This article provides a concise review of these different predictive
coding algorithms, highlighting their similarities and differences. Five
algorithms are covered: linear predictive coding which has a long and
influential history in the signal processing literature; the first
neuroscience-related application of predictive coding to explaining the function
of the retina; and three versions of predictive coding that have been proposed
to model cortical function. While all these algorithms aim to fit a generative
model to sensory data, they differ in the type of generative model they employ,
in the process used to optimise the fit between the model and sensory data, and
in the way that they are related to neurobiology.
W. Muhammad and M. W. Spratling (2017) A neural model of coordinated head and eye movement control. Journal of Intelligent & Robotic Systems, 85(1):107-26.
PDFAbstract
Gaze shifts require the coordinated movement of both the eyes and the head in both animals and humanoid robots. To achieve this the brain and the robot control system needs to be able to perform complex non-linear sensory-motor transformations between many degrees of freedom and resolve the redundancy in such a system. In this article we propose a hierarchical neural network model for performing 3-D coordinated gaze shifts. The network is based on the PC/BC-DIM (Predictive Coding/Biased Competition with Divisive Input Modulation) basis function model. The proposed model consists of independent eyes and head controlled circuits with mutual interactions for the appropriate adjustment of coordination behaviour. Based on the initial eyes and head positions the network resolves redundancies involved in 3-D gaze shifts and produces accurate gaze control without any kinematic analysis or imposing any constraints. Furthermore the behaviour of the proposed model is consistent with coordinated eye and head movements observed in primates.
Q. Wang and M. W. Spratling (2016) Contour detection in colour images using a neurophysiologically inspired model. Cognitive Computation, 8(6):1027-35.
PDFAbstract
The predictive coding/biased competition (PC/BC) model of V1 has previously been applied to locate boundaries defined by local discontinuities in intensity within an image. Here it is extended to perform contour detection for colour images. The proposed extensions are inspired by neurophysiological data from single neurons in macaque primary visual cortex (V1), and the behaviour of this extended model is consistent with the neurophysiological experimental results. Furthermore, when compared to methods used for contour detection in computer vision, the colour PC/BC model of V1 slightly outperforms some recently proposed algorithms which use more cues and/or require a complicated training procedure.
M. W. Spratling (2016) A neural implementation of Bayesian inference based on predictive coding. Connection Science, 28(4):346-83.
PDFCodeAbstract
Predictive coding is a leading theory of cortical function that
has previously been shown to explain a great deal of neurophysiological and
psychophysical data. Here it is shown that predictive coding can perform almost
exact Bayesian inference when applied to computing with population codes. It is
demonstrated that the proposed algorithm, based on predictive coding, can:
decode probability distributions encoded as noisy population codes; combine
priors with likelihoods to calculate posteriors; perform cue integration and cue
segregation; perform function approximation; be extended to perform hierarchical
inference; simultaneously represent and reason about multiple stimuli; and
perform inference with multi-modal and non-Gaussian probability distributions.
Predictive coding thus provides a neural network based method for performing
probabilistic computation and provides a simple, yet comprehensive, theory of
how the cerebral cortex performs Bayesian inference.
M. W. Spratling (2016) Predictive coding as a model of cognition. Cognitive Processing, 17(3): 279-305.
PDFCodeAbstract
Previous work has shown that predictive coding can provide a detailed
explanation of a very wide range of low-level perceptual processes. It is also
widely believed that predictive coding can account for high-level, cognitive,
abilities. This article provides support for this view by showing that
predictive coding can simulate phenomena such as categorisation, the influence
of abstract knowledge on perception, recall and reasoning about conceptual
knowledge, context-dependent behavioural control, and naive physics. The
particular implementation of predictive coding used here (PC/BC-DIM) has previously
been used to simulate low-level perceptual behaviour and the neural mechanisms
that underlie them. This algorithm thus provides a single framework for
modelling both perceptual and cognitive brain function.
M. W. Spratling (2016) A neural implementation of the Hough transform and the advantages of explaining away. Image and Vision Computing, 52:15-24.
PDFCodeAbstract
The Hough Transform (HT) is widely used for feature extraction and object
detection. However, during the HT individual image elements vote for many
possible parameter values. This results in a dense accumulator array and
problems identifying the parameter values that correspond to image
features. This article proposes a new method for implementing the voting process
in the HT. This method employs a competitive neural network algorithm to perform
a form of probabilistic inference known as "explaining away". This results in a
sparse accumulator array in which the parameter values of image features can be
more accurately identified. The proposed method is initially demonstrated using
the simple, prototypical, task of straight line detection in synthetic
images. In this task it is shown to more accurately identify straight lines, and
the parameter of those lines, compared to the standard Hough voting process. The
proposed method is further assessed using a version of the implicit shape model
(ISM) algorithm applied to car detection in natural images. In this application
it is shown to more accurately identify cars, compared to using the standard
Hough voting process in the same algorithm, and compared to the original ISM
algorithm.
Q. Wang and M. W. Spratling (2016) A simplified texture gradient method for improved image segmentation. Signal, Image and Video Processing, 10(4):679-86.
PDFAbstract
Inspired by the probability of boundary (Pb) algorithm, a simplified texture gradient method has been developed to locate texture boundaries within grayscale images. Despite considerable simplification, the proposed algorithm's ability to locate texture boundaries is comparable with Pb's texture boundary method. The proposed texture gradient method is also integrated with a biologically inspired model, to enable boundaries defined by discontinuities in both intensity and texture to be located. The combined algorithm outperforms the current state-of-art image segmentation method (Pb) when this method is also restricted to using only local cues of intensity and texture at a single scale.
W. Muhammad and M. W. Spratling (2015) A neural model of binocular saccade planning and
vergence control. Adaptive Behavior, 23(5):265-82.
PDFAbstract
The human visual system
uses saccadic and vergence eye movements to foveate visual targets. To mimic
this aspect of the biological visual system the PC/BC-DIM neural network is used
as an omni-directional basis function network for learning and performing
sensory-sensory and sensory-motor transformations without using any hard-coded
geometric information. A hierarchical PC/BC-DIM network is used to learn a
head-centred representation of visual targets by dividing the whole problem into
independent subtasks. The learnt head- centred representation is then used to
generate saccade and vergence motor commands. The performance of the proposed
system is tested using the iCub humanoid robot simulator.
M. W. Spratling (2014) Classification using sparse
representations: a biologically plausible approach. Biological Cybernetics,
108(1):61-73.
PDFCodeAbstract
Representing signals as
linear combinations of basis vectors sparsely selected from an overcomplete
dictionary has proven to be advantageous for many applications in pattern
recognition, machine learning, signal processing, and computer vision. While
this approach was originally inspired by insights into cortical information
processing, biologically-plausible approaches have been limited to exploring the
functionality of early sensory processing in the brain, while more practical
application have employed non-biologically-plausible sparse-coding
algorithms. Here, a biologically-plausible algorithm is proposed that can be
applied to practical problems. This algorithm is evaluated using standard
benchmark tasks in the domain of pattern classification, and its performance is
compared to a wide range of alternative algorithms that are widely used in
signal and image processing. The results show that, for the classification
tasks performed here, the proposed method is very competitive with the best of
the alternative algorithms that have been evaluated. This demonstrates that
classification using sparse representations can be performed in a
neurally-plausible manner, and hence, that this mechanism of classification
might be exploited by the brain.
M. W. Spratling (2014) A single functional model of drivers and modulators in cortex. Journal of Computational Neuroscience, 36(1): 97-118.
PDFCodeAbstract
A distinction is commonly made between synaptic connections capable of evoking a
response ("drivers") and those that can alter ongoing activity but not
initiate it ("modulators"). Here it is proposed that, in cortex, both drivers
and modulators are an emergent property of the perceptual inference performed by
cortical circuits. Hence, it is proposed that there is a single underlying
computational explanation for both forms of synaptic connection. This idea is
illustrated using a predictive coding model of cortical perceptual inference.
In this model all synaptic inputs are treated identically. However,
functionally, certain synaptic inputs drive neural responses while others have a
modulatory influence. This model is shown to account for driving and modulatory
influences in bottom-up, lateral, and top-down pathways, and is used to simulate
a wide range of neurophysiological phenomena including surround suppression,
contour integration, gain modulation, spatio-temporal prediction, and attention.
The proposed computational model thus provides a single functional explanation
for drivers and modulators and a unified account of a diverse range of
neurophysiological data.
M. W. Spratling (2013) Predictive coding. In Encyclopedia of Computational Neuroscience, D. Jaeger and R. Jung (Eds.), Springer, New York.
M. W. Spratling (2013) Distinguishing theory from implementation in predictive coding accounts of brain function [commentary]. Behavioral and Brain Sciences, 36(3):231-2.
M. W. Spratling (2013) Image segmentation using a sparse coding model of cortical area V1. IEEE Transactions on Image Processing, 22(4):1631-43.
PDFCodeAbstract
Algorithms that encode images using a sparse set of basis
functions have previously been shown to explain aspects of the physiology of
primary visual cortex (V1), and have been used for applications such as image
compression, restoration, and classification. Here, a sparse coding algorithm,
that has previously been used to account of the response properties of
orientation tuned cells in primary visual cortex, is applied to the task of
perceptually salient boundary detection. The proposed algorithm is currently
limited to using only intensity information at a single scale. However, it is
shown to out-perform the current state-of-the-art image segmentation method (Pb)
when this method is also restricted to using the same information.
K. De Meyer and M. W. Spratling (2013) A model of partial reference frame transforms through pooling of gain-modulated responses. Cerebral Cortex, 23(5):1230-9.
PDFCodeAbstract
In multimodal integration and sensorimotor transformation areas of posterior
parietal cortex (PPC), neural responses often appear encoded in spatial
reference frames that are intermediate to intrinsic sensory reference frames,
e.g., eye-centred for visual or head-centred for auditory stimulation. Many
sensory responses in these areas are also modulated by direction of gaze. We
demonstrate that certain types of mixed-frame responses can be generated by
pooling gain-modulated responses - similarly to how complex cells in visual
cortex are thought to pool the responses of simple cells. The proposed model
simulates two types of mixed-frame responses observed in PPC: in particular,
sensory responses that shift differentially with gaze in horizontal and vertical
dimensions; and sensory responses that shift differentially for different start
and end points along a single dimension of gaze. We distinguish these two types
of mixed-frame responses from a third type in which sensory responses shift a
partial yet approximately equal amount with each gaze shift. We argue that the
empirical data on mixed-frame responses may be caused by multiple mechanisms,
and we adapt existing reference-frame measures to distinguish between the
different types. Finally, we discuss how mixed-frame responses may be revealing
of the local organisation of presynaptic responses.
M. W. Spratling (2012) Predictive coding accounts for V1
response properties recorded using reverse correlation. Biological Cybernetics, 106(1):37-49.
PDFCodeAbstract
PC/BC ("Predictive Coding/Biased Competition") is a simple computational model that has previously been shown to explain a very wide range of V1 response properties. This article extends work on the PC/BC model of V1 by showing that it can also account for V1 response properties measured using the reverse correlation methodology. Reverse correlation employs an experimental procedure that is significantly different from that used in more typical neurophysiological experiments, and measures some distinctly different response properties in V1. Despite these differences PC/BC successfully accounts for the data. The current results thus provide additional support for the PC/BC model of V1 and further demonstrate that PC/BC offers a unified explanation for the seemingly diverse range of behaviours observed in primary visual cortex.
M. W. Spratling (2012) Predictive coding as a model of the V1
saliency map hypothesis. Neural Networks, 26:7-28.
PDFCodeAbstract
The predictive coding/biased competition (PC/BC) model is a specific implementation of predictive coding theory that has previously been shown to provide a detailed account of the response properties of orientation tuned cells in primary visual cortex (V1). Here it is shown that the same model can successfully simulate psychophysical data relating to the saliency of unique items in search arrays, of contours embedded in random texture, and of borders between textured regions. This model thus provides a possible implementation of the hypothesis that V1 generates a bottom-up saliency map. However, PC/BC is very different from previous models of visual salience, in that it proposes that saliency results from the failure of an internal model of simple elementary image components to accurately predict the visual input. Saliency can therefore be interpreted as a mechanism by which prediction errors attract attention in an attempt to improve the accuracy of the brain's internal representation of the world.
M. W. Spratling (2012) Unsupervised learning of generative and discriminative
weights encoding elementary image components in a predictive coding model of
cortical function.
Neural Computation, 24(1): 60-103.
PDFCodeAbstract
A method is presented for learning the reciprocal feedforward and feedback
connections required by the predictive coding model of cortical
function. Using this method feedforward and feedback connections are learnt
simultaneously and independently in a biologically plausible manner. The
performance of the proposed algorithm is evaluated by applying it to learning
the elementary components of artificial images and of natural images. For
artificial images the bars problem is employed and the proposed algorithm is
shown to produce state-of-the-art performance on this task. For natural
images, components resembling Gabor functions are learnt in the first
processing stage and neurons responsive to corners are learnt in the second
processing stage. The properties of these learnt representations are in good
agreement with neurophysiological data from V1 and V2. The proposed algorithm
demonstrates for the first time that a single computational theory can explain
the formation of cortical RFs, and also the response properties of cortical
neurons once those RFs have been learnt.
K. De Meyer and M. W. Spratling (2011) Multiplicative gain
modulation arises through unsupervised learning in a predictive coding model of
cortical function. Neural Computation, 23(6):1536-67.
PDFCodeAbstract
The combination of two or more population-coded signals in a
neural model of predictive coding can give rise to multiplicative
gain modulation in the response properties of individual neurons.
Synaptic weights generating these multiplicative response
properties can be learned using an unsupervised, Hebbian, learning
rule. The behaviour of the model is compared to empirical data on
gaze-dependent gain modulation of cortical cells, and found to be
in good agreement with a range of physiological observations.
Furthermore, it is demonstrated that the model can learn to
represent a set of basis functions. The current paper thus
connects an often-observed neurophysiological phenomenon and
important neurocomputational principle (gain modulation) with an
influential theory of brain operation (predictive coding).
M. W. Spratling (2011) A single functional model accounts for the
distinct properties of suppression in cortical area V1. Vision
Research, 51(6):563-76.
PDFCodeAbstract
Cross-orientation suppression and surround suppression have been extensively
studied in primary visual cortex (V1). These two forms of suppression have some
distinct properties which has led to the suggestion that they are generated by
different underlying mechanisms. Furthermore, it has been suggested that
mechanisms other than intracortical inhibition may be central to both forms of
suppression. A simple computational model (PC/BC), in which intracortical
inhibition is fundamental, is shown to simulate the distinct properties of
cross-orientation and surround suppression. The same model has previously been
shown to account for a large range of V1 response properties including
orientation-tuning, spatial and temporal frequency tuning, facilitation and
inhibition by flankers and textured surrounds as well as a range of other
experimental results on cross-orientation suppression and surround
suppression. The current results thus provide additional support for the PC/BC
model of V1 and for the proposal that the diverse range of response properties
observed in V1 neurons have a single computational explanation. Furthermore,
these results demonstrate that current neurophysiological evidence is
insufficient to discount intracortical inhibition as a central mechanism
underlying both forms of suppression.
M. W. Spratling (2010) Predictive coding as a model of response
properties in cortical area V1. Journal of
Neuroscience, 30(9):3531-43.
PDFCodeAbstract
A simple model is shown to account for a large range of V1 classical, and
non-classical, receptive field properties including orientation-tuning,
spatial and temporal frequency tuning, cross-orientation suppression, surround
suppression, and facilitation and inhibition by flankers and textured
surrounds. The model is an implementation of the predictive coding theory of
cortical function and thus provides a single computational explanation for a
diverse range of neurophysiological findings. Furthermore, since predictive
coding can be related to the biased competition theory and is a specific
example of more general theories of hierarchical perceptual inference the
current results relate V1 response properties to a wider, more unified,
framework for understanding cortical function.
M. W. Spratling (2009) Learning posture invariant spatial
representations through temporal correlations. IEEE Transactions on Autonomous
Mental Development, 1(4):253-63.
PDFAbstract
A hierarchical neural network model is used to learn, without supervision,
sensory-sensory coordinate transformations like those believed to be encoded
in the dorsal pathway of the cerebral cortex. The resulting representations of
visual space are invariant to eye orientation, neck orientation, or posture in
general. These posture invariant spatial representations are learned using the
same mechanisms that have previously been proposed to operate in the cortical
ventral pathway to learn object representation that are invariant to
translation, scale, orientation, or viewpoint in general. This model thus
suggests that the same mechanisms of learning and development operate across
multiple cortical hierarchies.
K. De Meyer and M. W. Spratling (2009) A model of non-linear
interactions between cortical top-down and horizontal connections explains the
attentional gating of collinear facilitation. Vision Research, 49(5):553-68.
PDFAbstract
Past physiological and psychophysical experiments have shown that attention can
modulate the effects of contextual information appearing outside the classical
receptive field of a cortical neuron. Specifically, it has been suggested that
attention, operating via cortical feedback connections, gates the effects of
long-range horizontal connections underlying collinear facilitation in cortical
area V1. This article proposes a novel mechanism, based on the computations
performed within the dendrites of cortical pyramidal cells, that can account for
these observations. Furthermore, it is shown that the top-down gating signal
into V1 can result from a process of biased competition occurring in
extrastriate cortex. A model based on these two assumptions is used to replicate
the results of physiological and psychophysical experiments on collinear
facilitation and attentional modulation.
M. W. Spratling, K. De Meyer and R. Kompass (2009)
Unsupervised learning of overlapping image components using divisive input modulation.
Computational Intelligence and Neuroscience, 2009(381457):1-19.
PDFCodeAbstract
This paper demonstrates that non-negative matrix factorisation is mathematically
related to a class of neural networks that employ negative feedback as a
mechanism of competition. This observation inspires a novel learning algorithm
which we call Divisive Input Modulation (DIM). The proposed algorithm provides a
mathematically simple and computationally efficient method for the unsupervised
learning of image components, even in conditions where these elementary features
overlap considerably. To test the proposed algorithm, a novel artificial task is
introduced which is similar to the frequently-used bars problem but employs
squares rather than bars to increase the degree of overlap between
components. Using this task, we investigate how the proposed method performs on
the parsing of artificial images composed of overlapping features, given the
correct representation of the individual components; and secondly, we
investigate how well it can learn the elementary components from artificial
training images. We compare the performance of the proposed algorithm with its
predecessors including variations on these algorithms that have produced
state-of-the-art performance on the bars problem. The proposed algorithm is more
successful than its predecessors in dealing with overlap and occlusion in the
artificial task that has been used to assess performance.
M. W. Spratling (2008)
Predictive coding as a model of biased competition in visual attention.
Vision Research, 48(12):1391-408.
PDFCodeAbstract
Attention acts, through cortical feedback pathways, to enhance the response of
cells encoding expected or predicted information. Such observations are
inconsistent with the predictive coding theory of cortical function which
proposes that feedback acts to suppress information predicted by higher-level
cortical regions. Despite this discrepancy, this article demonstrates that the
predictive coding model can be used to simulate a number of the effects of
attention. This is achieved via a simple mathematical rearrangement of the
predictive coding model, which allows it to be interpreted as a form of biased
competition model. Nonlinear extensions to the model are proposed that enable it
to explain a wider range of data.
M. W. Spratling (2008)
Reconciling predictive coding and biased competition models of cortical function.
Frontiers in Computational Neuroscience, 2(4):1-8.
PDFAbstract
A simple variation of the standard biased competition model is shown, via some
trivial mathematical manipulations, to be identical to predictive coding.
Specifically, it is shown that a particular implementation of the biased
competition model, in which nodes compete via inhibition that targets the inputs
to a cortical region, is mathematically equivalent to the linear predictive
coding model. This observation demonstrates that these two important and
influential rival theories of cortical function are minor variations on the same
underlying mathematical model.
M. S. C. Thomas, G. Westermann, D. Mareschal M. H. Johnson, S. Siros and M. W. Spratling (2008)
Studying development in the 21st century [response to commentaries].
Behavioral and Brain Sciences, 31(3):345-56.
Abstract
In this response, we consider four main issues arising
from the commentaries to the target article. These include further
details of the theory of interactive specialization, the relationship
between neuroconstructivism and selectionism, the implications
of neuroconstructivism for the notion of representation, and the
role of genetics in theories of development. We conclude by
stressing the importance of multidisciplinary approaches in the
future study of cognitive development and by identifying
the directions in which neuroconstructivism can expand in the
Twenty-first Century.
S. Siros, M. W. Spratling, M. S. C. Thomas, G. Westermann, D. Mareschal and
M. H. Johnson (2008)
Précis of Neuroconstructivism: how the brain constructs cognition.
Behavioral and Brain Sciences, 31(3):321-31.
PDFAbstract
Neuroconstructivism proposes a unifying framework for the study of development
that brings together (1) constructivism (which views development as the
progressive elaboration of increasingly complex structures), (2) cognitive
neuroscience (which aims to understand the neural mechanisms underlying
behaviour), and (3) computational modelling (which proposes formal and explicit
specifications of information processing). The guiding principle of our approach
is context dependence, within and (in contrast to Marr) between levels of
organization. We propose that three mechanisms guide the emergence of
representations: competition, cooperation, and chronotopy, which themselves
allow for two central processes: proactivity and progressive specialization. We
suggest that the main outcome of development is partial representations,
distributed across distinct functional circuits. This framework is derived by
examining development at the level of single neurons, brain systems, and whole
organisms. We use the terms encellment, embrainment, and embodiment to describe
the higher-level contextual influences that act at each of these levels of
organization. To illustrate these mechanisms in operation we provide case
studies in early visual perception, infant habituation, phonological
development, and object representations in infancy. Three further case studies
are concerned with interactions between levels of explanation: social
development, atypical development and within that, the development of
dyslexia. We conclude that cognitive development arises from a dynamic,
contextual change in neural structures leading to partial representations across
multiple brain regions and timescales.
X. Zhang and M. W. Spratling (2008) Automated learning of coordinate
transformations. Proceedings of the Eighth International Conference on Epigenetic Robotics: Modeling
Cognitive Development in Robotic Systems (EPIROB08).
G. Westermann, D. Mareschal, M. H. Johnson, S. Siros, M. W. Spratling and M. S. C. Thomas (2007)
Neuroconstructivism. Developmental Science, 10(1):75-83.
PDFAbstract
Neuroconstructivism is a theoretical framework focusing on the construction of
representation in the developing brain. Cognitive development is explained as
emerging from the experience-dependent development of neural structures
supporting mental representations. Neural development occurs in the context of
multiple interacting constraints acting on different levels, from the individual
cell to the external environment of the developing child. Cognitive development
can thus be understood as a trajectory originating from the constraints on the
underlying neural structures. This perspective offers an integrated view of
normal and abnormal development as well as of development and adult processing,
and it stands apart from traditional cognitive approaches in taking seriously
the constraints on cognition inherent by the substrate that delivers it.
L. A. Watling, M. W. Spratling, K. De Meyer and M. Johnson
(2007) The role of feedback in the determination of figure and ground: a
combined behavioral and modeling study. Proceedings of the 29th Meeting of
the Cognitive Science
Society (CogSci07).
PDFAbstract
Object knowledge can exert on important influence on even the earliest stages of
visual processing. This study demonstrates how a familiarity bias, acquired only
briefly before testing, can affect the subsequent segmentation of an otherwise
ambiguous figure-ground array, in favor of perceiving the familiar shape as
figure. The behavioral data are then replicated using a biologically plausible
neural network model that employs feedback connections to implement the
demonstrated familiarity bias.
D. Mareschal, M. H. Johnson, S. Siros, M. W. Spratling, M. S. C. Thomas
and G. Westermann (2007) Neuroconstructivism: How
the Brain Constructs Cognition, Oxford University Press: Oxford,
UK.
M. W. Spratling (2006)
Learning image components for object recognition.
Journal of Machine Learning Research, 7:793-815.
PDFAbstract
In order to perform object recognition it is necessary to learn representations
of the underlying components of images. Such components correspond to objects,
object-parts, or features. Non-negative matrix factorisation is a generative
model that has been specifically proposed for finding such meaningful
representations of image data, through the use of non-negativity constraints on
the factors. This article reports on an empirical investigation of the
performance of non-negative matrix factorisation algorithms. It is found that
such algorithms need to impose additional constraints on the sparseness of the
factors in order to successfully deal with occlusion. However, these constraints
can themselves result in these algorithms failing to identify image components
under certain conditions. In contrast, a recognition model (a competitive
learning neural network algorithm) reliably and accurately learns
representations of elementary image features without such constraints.
M. W. Spratling and M. H. Johnson (2006)
A feedback model of perceptual learning and categorisation.
Visual Cognition, 13(2):129-65.
PDFAbstract
Top-down, feedback, influences are known to have significant effects on visual
information processing. Such influences are also likely to affect perceptual
learning. This article employs a computational model of the cortical region
interactions underlying visual perception to investigate possible influences of
top-down information on learning. The results suggest that feedback could bias
the way in which perceptual stimuli are categorised and could also facilitate
the learning of sub-ordinate level representations suitable for object
identification and perceptual expertise.
M. W. Spratling (2005)
Learning viewpoint invariant perceptual representations from cluttered images.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5):753-61.
PDFAbstract
In order to perform object recognition, it is necessary to form perceptual
representations that are sufficiently specific to distinguish between objects,
but that are also sufficiently flexible to generalise across changes in
location, rotation and scale. A standard method for learning perceptual
representations that are invariant to viewpoint is to form temporal associations
across image sequences showing object transformations. However, this method
requires that individual stimuli are presented in isolation and is therefore
unlikely to succeed in real-world applications where multiple objects can
co-occur in the visual input. This article proposes a simple modification to the
learning method, that can overcome this limitation, and results in more robust
learning of invariant representations.
M. W. Spratling (2004)
Local versus distributed: a poor taxonomy of neural coding strategies [commentary].
Behavioral and Brain Sciences, 27(5):700-2.
M. W. Spratling and M. H. Johnson (2004)
Neural coding strategies and mechanisms of competition.
Cognitive Systems
Research, 5(2):93-117.
PDFAbstract
A long running debate has concerned the question of whether neural
representations are encoded using a distributed or a local coding scheme. In
both schemes individual neurons respond to certain specific patterns of
pre-synaptic activity. Hence, rather than being dichotomous, both coding
schemes are based on the same representational mechanism. We argue that a
population of neurons needs to be capable of learning both local and distributed
representations, as appropriate to the task, and should be capable of generating
both local and distributed codes in response to different stimuli. Many neural
network algorithms, which are often employed as models of cognitive processes,
fail to meet all these requirements. In contrast, we present a neural network
architecture which enables a single algorithm to efficiently learn, and respond
using, both types of coding scheme.
M. W. Spratling and M. H. Johnson (2004)
A feedback model of visual attention.
Journal of Cognitive Neuroscience, 16(2):219-37.
PDFAbstract
Feedback connections are a prominent feature of cortical anatomy and are likely
to have a significant functional role in neural information processing. We
present a neural network model of cortical feedback that successfully simulates
neurophysiological data associated with attention. In this domain our model can
be considered a more detailed, and biologically plausible, implementation of the
biased competition model of attention. However, our model is more general as it
can also explain a variety of other top-down processes in vision, such as
figure/ground segmentation and contextual cueing. This model thus suggests that
a common mechanism, involving cortical feedback pathways, is responsible for a
range of phenomena and provides a unified account of currently disparate areas
of research.
M. W. Spratling and M. H. Johnson (2003)
Exploring the functional significance of dendritic inhibition in cortical pyramidal cells.
Neurocomputing, 52-54:389-95.
PDFAbstract
Inhibitory synapses contacting the soma and axon initial segment are commonly
presumed to participate in shaping the response properties of cortical pyramidal
cells. Such an inhibitory mechanism has been explored in numerous computational
models. However, the majority of inhibitory synapses target the dendrites of
pyramidal cells, and recent physiological data suggests that this dendritic
inhibition affects tuning properties. We describe a model that can be used to
investigate the role of dendritic inhibition in the competition between
neurons. With this model we demonstrate that dendritic inhibition significantly
enhances the computational and representational properties of neural networks.
M. W. Spratling and M. H. Johnson (2002) Exploring the functional
significance of dendritic inhibition in cortical pyramidal cells.
Proceedings of the 11th Computational Neuroscience Meeting (CNS02). (Reprinted in the journal Neurocomputing, 2003; see above)
M. W. Spratling (2002)
Cortical region interactions and the functional role of apical dendrites.
Behavioral and Cognitive Neuroscience Reviews, 1(3):219-28.
PDFAbstract
The basal and distal apical dendrites of pyramidal cells occupy distinct
cortical layers and are targeted by axons originating in different cortical
regions. Hence, apical and basal dendrites receive information from distinct
sources. Physiological evidence suggests that this anatomically observed
segregation of input sources may have functional significance. This possibility
has been explored in various connectionist models that employ neurons with
functionally distinct apical and basal compartments. A neuron in which separate
sets of inputs can be integrated independently has the potential to operate in a
variety of ways which are not possible for the conventional model of a neuron in
which all inputs are treated equally. This article thus considers how
functionally distinct apical and basal dendrites can contribute to the
information processing capacities of single neurons and, in particular, how
information from different cortical regions could have disparate affects on
neural activity and learning.
M. W. Spratling and M. H. Johnson (2002)
Pre-integration lateral inhibition enhances unsupervised learning.
Neural
Computation, 14(9):2157-79.
PDFAbstract
A large and influential class of neural network architectures use
post-integration lateral inhibition as a mechanism for competition. We argue
that these algorithms are computationally deficient in that they fail to
generate, or learn, appropriate perceptual representations under certain
circumstances. An alternative neural network architecture is presented in which
nodes compete for the right to receive inputs rather than for the right to
generate outputs. This form of competition, implemented through pre-integration
lateral inhibition, does provide appropriate coding properties and can be used
to efficiently learn such representations. Furthermore, this architecture is
consistent with both neuro-anatomical and neuro-physiological data. We thus
argue that pre-integration lateral inhibition has computational advantages over
conventional neural network architectures while remaining equally biologically
plausible.
M. W. Spratling and M. H. Johnson (2001)
Dendritic inhibition enhances neural coding properties.
Cerebral Cortex, 11(12):1144-9.
PDFAbstract
The presence of a large number of inhibitory contacts at the soma and axon
initial segment of cortical pyramidal cells has inspired a large and influential
class of neural network model which use post-integration lateral inhibition as a
mechanism for competition between nodes. However, inhibitory synapses also
target the dendrites of pyramidal cells. The role of this dendritic inhibition
in competition between neurons has not previously been addressed. We
demonstrate, using a simple computational model, that such pre-integration
lateral inhibition provides networks of neurons with useful representational and
computational properties which are not provided by post-integration
inhibition.
S. J. Grice, M. W. Spratling, A. Karmiloff-Smith, H. Halit, G. Csibra, M. de Haan and M. H. Johnson (2001) Disordered visual processing and oscillatory brain activity in autism and Williams Syndrome.
NeuroReport, 12(12):2697-700.
PDFAbstract
Two developmental disorders, autism and Williams Syndrome, are both commonly
described as having difficulties in integrating perceptual features, i.e.,
binding spatially separate elements into a whole. It is already known that
healthy adults and infants display electroencephalographic (EEG) gamma band
bursts (around 40Hz) when the brain is required to achieve such binding . Here
we explore gamma band EEG in autism and Williams Syndrome and demonstrate
differential abnormalities in the two phenotypes. We show that despite putative
processing similarities at the cognitive level, binding in Williams Syndrome and
autism can be dissociated at the neurophysiological level by different
abnormalities in underlying brain oscillatory activity. Our study is the first
to identify that binding related gamma EEG can be disordered in humans.
M. W. Spratling and M. H. Johnson (2001) Activity-dependent processes in regional cortical specialization [commentary].
Developmental Science, 4(2):153-4.
G. Csibra, G. Davis, M. W. Spratling and M. H. Johnson (2000)
Gamma oscillations and object processing in the infant brain.
Science, 290(5496):1582-5.
PDFAbstract
An enduring controversy in neuroscience concerns how the brain binds
together separately coded stimulus features to form unitary
representations of objects. Recent evidence has indicated a close link
between this binding process and 40Hz (gamma-band) oscillations
generated by localized neural circuits (1). In a separate line of
research, the ability of young infants to perceive objects as unitary
and bounded has become a central focus for debates about the
mechanisms of perceptual development (2). However, to date these
infant studies have been behavioural, and there have been few, if any,
paradigms involving direct measures of neural function. Here we
demonstrate for the first time that binding-related 40Hz oscillations
are evident in the infant brain around 8 months of age, the same age
as some behavioral studies indicate the onset of perceptual binding of
spatially separated static visual features. The discovery of
binding-related gamma in infants opens up a new vista for experiments
on postnatal functional brain development in infants.
M. W. Spratling and G. M. Hayes (2000)
Learning synaptic clusters for non-linear dendritic processing.
Neural Processing Letters, 11(1):17-27.
PDFGzipped PostscriptAbstract
Nonlinear dendritic processing appears to be a feature of biological neurons and
would also be of use in many applications of artificial neural networks. This
paper presents a model of an initially standard linear unit which uses
unsupervised learning to find clusters of inputs within which inactivity at one
synapse can occlude the activity at the other synapses.
M. W. Spratling (1999)
Artificial Ontogenesis: A Connectionist Model of Development.
PhD Thesis, University of Edinburgh.
PDFAbstract
This thesis suggests that ontogenetic adaptive processes are important for
generating intelligent behaviour. It is thus proposed that such processes, as
they occur in nature, need to be modelled and that such a model could be used
for generating artificial intelligence, and specifically robotic
intelligence. Hence, this thesis focuses on how mechanisms of intelligence are
specified.
A major problem in robotics is the need to predefine the behaviour to be
followed by the robot. This makes design intractable for all but the simplest
tasks and results in controllers that are specific to that particular task and
are brittle when faced with unforeseen circumstances. These problems can be
resolved by providing the robot with the ability to adapt the rules it follows
and to autonomously create new rules for controlling behaviour. This solution
thus depends on the predefinition of how rules to control behaviour are to be
learnt rather than the predefinition of rules for behaviour themselves.
Learning new rules for behaviour occurs during the developmental process in
biology. Changes in the structure of the cerebral cortex underly behavioural and
cognitive development throughout infancy and beyond. The uniformity of the
neocortex suggests that there is significant computational uniformity across the
cortex resulting from uniform mechanisms of development, and holds out the
possibility of a general model of development. Development is an interactive
process between genetic predefinition and environmental influences. This
interactive process is constructive: qualitatively new behaviours are learnt by
using simple abilities as a basis for learning more complex ones. The
progressive increase in competence, provided by development, may be essential to
make tractable the process of acquiring higher-level abilities.
While simple behaviours can be triggered by direct sensory cues, more complex
behaviours require the use of more abstract representations. There is thus a
need to find representations at the correct level of abstraction appropriate to
controlling each ability. In addition, finding the correct level of abstraction
makes tractable the task of associating sensory representations with motor
actions. Hence, finding appropriate representations is important both for
learning behaviours and for controlling behaviours. Representations can be
found by recording regularities in the world or by discovering re-occurring
patterns through repeated sensory-motor interactions. By recording regularities
within the representations thus formed, more abstract representations can be
found. Simple, non-abstract, representations thus provide the basis for learning
more complex, abstract, representations.
A modular neural network architecture is presented as a basis for a model of
development. The pattern of activity of the neurons in an individual network
constitutes a representation of the input to that network. This representation
is formed through a novel, unsupervised, learning algorithm which adjusts the
synaptic weights to improve the representation of the input data.
Representations are formed by neurons learning to respond to correlated sets of
inputs. Neurons thus became feature detectors or pattern recognisers. Because
the nodes respond to patterns of inputs they encode more abstract features of
the input than are explicitly encoded in the input data itself. In this way
simple representations provide the basis for learning more complex
representations. The algorithm allows both more abstract representations to be
formed by associating correlated, coincident, features together, and invariant
representations to be formed by associating correlated, sequential, features
together.
The algorithm robustly learns accurate and stable representations, in a format
most appropriate to the structure of the input data received: it can represent
both single and multiple input features in both the discrete and continuous
domains, using either topologically or non-topologically organised nodes. The
output of one neural network is used to provide inputs for other networks. The
robustness of the algorithm enables each neural network to be implemented using
an identical algorithm. This allows a modular `assembly' of neural networks to
be used for learning more complex abilities: the output activations of a network
can be used as the input to other networks which can then find representations
of more abstract information within the same input data; and, by defining the
output activations of neurons in certain networks to have behavioural
consequences it is possible to learn sensory-motor associations, to enable
sensory representations to be used to control behaviour.
M. W. Spratling (1999)
Pre-synaptic lateral inhibition provides a better architecture for self-organising neural networks. Network: Computation in Neural Systems, 10(4):285-301.
PDFGzipped PostscriptAbstract
Unsupervised learning is an important property of the brain and of
many artificial neural networks. A large variety of unsupervised
learning algorithms have been proposed. This paper takes a different
approach in considering the architecture of the neural network rather
than the learning algorithm. It is shown that a self-organising neural
network architecture using pre-synaptic lateral inhibition enables a
single learning algorithm to find distributed, local, and topological
representations as appropriate to the structure of the input data
received. It is argued that such an architecture not only has
computational advantages but is a better model of cortical
self-organisation.
M. W. Spratling and G. M. Hayes (1998)
Learning sensory-motor cortical mappings without training.
Proceedings of the 6th European Symposium on Artificial Neural Networks (ESANN).
M. Verleysen (ed.) pp. 339-44. D-facto Publications.
Gzipped PostscriptAbstract
This paper shows how the relationship between two arrays of artificial
neurons, representing different cortical regions, can be learned. The
algorithm enables each neural network to self-organise into a topological map
of the domain it represents at the same time as the relationship between
these maps is found. Unlike previous methods learning is achieved without a
separate training phase; the algorithm which learns the mapping is also that
which performs the mapping.
M. W. Spratling and G. M. Hayes (1998)
A self-organising neural network for modelling cortical development.
Proceedings of the 6th European Symposium on Artificial Neural Networks (ESANN).
M. Verleysen (ed.) pp. 333-8. D-facto Publications.
Gzipped PostscriptAbstract
This paper presents a novel self-organising neural network. It has been
developed for use as a simplified model of cortical development. Unlike
many other models of topological map formation all synaptic weights start at
zero strength (so that synaptogenesis might be modelled). In addition, the
algorithm works with the same format of encoding for both inputs to and
outputs from the network (so that the transfer and recoding of information
between cortical regions might be modelled).
M. W. Spratling (1997)
Artificial Ontogenesis: Cognitive and Behavioural Development for Robots.
Unpublished Departmental Discussion Paper,
Department of Artificial Intelligence,
University of Edinburgh.
Abstract
There are three classes of adaptive process (structural definition,
structural adjustment, and parameter adjustment) which appear to underly the
development of intelligence in nature. In artificial intelligence only two
of these processes are used; AI ignores development (structural adjustment).
While AI attempts to predefine explicit rules for behaviour, nature's
success in building complex creatures depends on predefining how rules to
control behaviour can be learned. It is the developmental processes in
biology through which such rules are learned. This proposal is to apply
mechanisms similar to those used in biological development to robots. This
will move robotics from `development' meaning design and production, towards
`development' in its biological sense meaning a process of growth and
progressive change. Defining the rules for development is design at a
meta-level to that currently used. It is proposed that the long process of
evolution used by nature to define these developmental processes might be
supplanted by another adaptive process, that of engineering, to more quickly
enable study of ontogenetic development.
This project thus aims to apply techniques inspired by animal development to
engineering robot control systems. Specifically it is proposed that a
hierarchical control system, based on the cerebral cortex, is used and that
this develops through constructivist learning algorithms (ones in which the
interaction of a situated agent with its environment guides the creation of
cognitive machinery appropriate for representing and acting in that
environment). Such a robot would be provided with some innate, low-level,
behavioural abilities and through experience develop more complex behaviour.
M. W. Spratling and R. Cipolla (1996)
Uncalibrated visual servoing.
Proceedings of the
7th British Machine Vision Conference (BMVC).
R. B. Fisher and E. Trucco (eds.) pp. 545-54. BMVA.
Gzipped PostscriptAbstract
Visual servoing is a process to enable a robot to position a camera with
respect to known landmarks using the visual data obtained by the camera
itself to guide camera motion. A solution is described which requires very
little a priori information freeing it from being specific to a particular
configuration of robot and camera. The solution is based on closed loop
control together with deliberate perturbations of the trajectory to provide
calibration movements for refining that trajectory. Results from
experiments in simulation and on a physical robot arm (camera-in-hand
configuration) are presented.
M. W. Spratling (1994)
Learning the Mapping Between Sensor and Motor Spaces to Produce Hand-Eye Coordination.
MSc Dissertation,
Department of Artificial Intelligence,
University of Edinburgh.
Abstract
Coordination between sensory inputs and motor actions is essential for
intelligent robotics. This dissertation considers the control of a
simple manipulator using sensory information to locate the target
position for the end-effector. The control mechanisms investigated all
form topographic maps of possible configurations of the manipulator
joints (the motor space) and the values of the sensor inputs (the
sensor space). Various methods are considered for learning to relate a
location on the sensor space map (which represents the target position
in the world) and the location in the motor space map which will
configure the manipulator to reach this target position. These methods
are analysed using a computer simulation and a suitable algorithm to
solve the hand-eye coordination problem is presented.
M. M. Ross, M. W. Spratling, C. B. Kirkland and P. S. Story (1994)
Measurement of microfog wetness in a model steam turbine using a miniature optical spectral extinction probe.
IMechE International Symposium on Optical Methods and Data Processing In Heat and Fluid Flow.