Michael Spratling: Home Page

Research Interests

Neural Networks

deep learning; biologically-inspired neural networks; adversarial robustness; representation learning; dynamic and deformable neural networks

Computational Neuroscience

predictive coding; neural information processing and coding; dendritic computation; cortical feedback connections; cortical region interactions

Computer Vision

object recognition; viewpoint invariance; image segmentation; object detection; tracking; anomaly detection; generalisation; out-of-distribution rejection; open-set recognition

Visual Cognition

visual attention; biased competition; top-down and contextual influences; visual salience; perceptual inference

Machine Learning

sparse coding; continuous learning; few-shot learning; unsupervised and self-supervised learning; competitive learning; classification; clustering; contrastive and deep-metric learning

Learning and Development

cortical, behavioural and cognitive development; perceptual learning; perceptual and conceptual development; epigenetic and developmental robotics

Publications

Google ScholarSemantic ScholarScopusResearcher IDORCID

L. Li and M. W. Spratling (2025) Robust shortcut and disordered robustness: improving adversarial training through adaptive smoothing. Pattern Recognition, in press.

PDF

Code

Abstract

Deep neural networks can be easily fooled into making incorrect predictions through corruption of the input by adversarial perturbations: human-imperceptible artificial noise. So far adversarial training has been the most successful defense against such adversarial attacks. This work focuses on improving adversarial training to boost adversarial robustness. We first analyze, from an instance-wise perspective, how adversarial vulnerability evolves during adversarial training. We find that during training an overall reduction of adversarial loss is achieved by sacrificing a considerable proportion of training samples to be more vulnerable to adversarial attack, which results in an uneven distribution of adversarial vulnerability among data. Such "uneven vulnerability", is prevalent across several popular robust training methods and, more importantly, relates to overfitting in adversarial training. Motivated by this observation, we propose a new adversarial training method: Instance-adaptive Smoothness Enhanced Adversarial Training (ISEAT). It jointly smooths both input and weight loss landscapes in an adaptive, instance-specific, way to enhance robustness more for those samples with higher adversarial vulnerability. Extensive experiments demonstrate the superiority of our method over existing defense methods. Noticeably, our method, when combined with the latest data augmentation and semi-supervised learning techniques, achieves state-of-the-art robustness against linf-norm constrained attacks on CIFAR10 of 59.32% for Wide ResNet34-10 without extra data, and 61.55% for Wide ResNet28-10 with extra data.

B. Gao and M. W. Spratling (2025) Softplus attention with re-weighting boosts length extrapolation in large language models. arXiv:2501.13428.

PDF

Code

Abstract

Large language models have achieved remarkable success in recent years, primarily due to the implementation of self-attention mechanisms. However, traditional Softmax attention suffers from numerical instability and reduced performance as the length of inference tokens increases. This paper addresses these issues by decomposing the Softmax operation into a non-linear transformation and the l1-norm. We identify the latter as essential for maintaining model performance. By replacing the non-linear transformation with the Softplus activation function and introducing a dynamic scale factor for different token lengths based on invariance entropy, we create a novel attention mechanism with performance better than conventional Softmax attention across various inference lengths. To further improve the length extrapolation ability of the proposed attention mechanism, we introduce a fine-tuning-free re-weighting mechanism that amplifies significant attention weights while diminishing weaker ones, enabling the model to concentrate more effectively on relevant tokens without requiring retraining. When combined with our proposed attention mechanism, this approach demonstrates significant promise in managing longer sequences, maintaining nearly constant validation loss even at 16× the training token length while ensuring numerical stability.

M. W. Spratling and H. H. Schütt (2025) A margin-based replacement for cross-entropy loss. arXiv:2501.12191.

PDF

Code

Abstract

Cross-entropy (CE) loss is the de-facto standard for training deep neural networks to perform classification. However, CE-trained deep neural networks struggle with robustness and generalisation issues. To alleviate these issues, we propose high error margin (HEM) loss, a variant of multi-class margin loss that overcomes the training issues of other margin-based losses. We evaluate HEM extensively on a range of architectures and datasets. We find that HEM loss is more effective than cross-entropy loss across a wide range of tasks: unknown class rejection, adversarial robustness, learning with imbalanced data, continual learning, and semantic segmentation (a pixel-level classification task). Despite all training hyper-parameters being chosen for CE loss, HEM is inferior to CE only in terms of clean accuracy and this difference is insignificant. We also compare HEM to specialised losses that have previously been proposed to improve performance on specific tasks. LogitNorm, a loss achieving state-of-the-art performance on unknown class rejection, produces similar performance to HEM for this task, but is much poorer for continual learning and semantic segmentation. Logit-adjusted loss, designed for imbalanced data, has superior results to HEM for that task, but performs more poorly on unknown class rejection and semantic segmentation. DICE, a popular loss for semantic segmentation, is inferior to HEM loss on all tasks, including semantic segmentation. Thus, HEM often out-performs specialised losses, and in contrast to them, is a general-purpose replacement for CE loss.

B. Gao and M. W. Spratling (2025) Filter competition results in more robust convolutional neural networks. Neurocomputing, 617:128972.

Code

Abstract

Convolutional layers, one of the basic building blocks of deep learning architectures, contain numerous trainable filters for feature extraction. These filters operate independently which can result in distinct filters learning similar weights and extracting similar features. In contrast, competition mechanisms in the brain contribute to the sharpening of the responses of activated neurons, enhancing the contrast and selectivity of individual neurons towards specific stimuli, and simultaneously increasing the diversity of responses across the population of neurons. Inspired by this observation, this paper proposes a novel convolutional layer based on the theory of predictive coding, in which each filter effectively tries to block other filters from responding to the input features which it represents. In this way, filters learn to become more distinct which increases the diversity of the extracted features. When replacing standard convolutional layers with the proposed layers the performance of classification networks is not only improved on ImageNet but also significantly boosted on eight robustness benchmarks, as well as on downstream detection and segmentation tasks. Most notably, ResNet50/101/152 robust accuracy increases by 15.9%/20.0%/20.9% under FGSM attack, and by 10.5%/14.7%/15.0% under PGD attack.

M. Fontana, M. W. Spratling and M. Shi (2025) Optimizing dense visual predictions through multi-task coherence and prioritization. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), in press.

PDF

Code

Abstract

Multi-Task Learning (MTL) involves the concurrent training of multiple tasks, offering notable advantages for dense prediction tasks in computer vision. MTL not only reduces training and inference time as opposed to having multiple single-task models but also enhances task accuracy through the interaction of multiple task. However, existing methods face limitations. They often rely on sub-optimal cross-task interactions, resulting in task-specific predictions with poor geometric and predictive coherence. Additionally, many approaches utilize inadequate loss weighting strategies, which fail to address the inherent variability in task evolution during training. To overcome these challenges, we propose an advanced MTL model specifically designed for dense vision tasks. Our model leverages the state-of-the-art vision transformer with task-speicifc decoders. To enhance cross-task coherence, we introduce a trace-back method that improves both cross-task geometric and predictive features. Furthermore, we present a novel dynamic task balancing approach that projects task losses onto a common scale and prioritizes more challenging tasks during training. Extensive experiments demonstrate the superiority of our method, establishing new state-of-the-art performance across two benchmark datasets.

W. A. Phillips, T. Bachmann, M. W. Spratling, L. Muckli, L. S. Petro and T. Zolnik (2025) Cellular psychology: relating cognition to context-sensitive pyramidal cells. Trends in Cognitive Sciences, 29(1): 28-40.

PDF

Abstract

`Cellular psychology' is a new field of inquiry that studies dendritic mechanisms for adapting mental events to the current context, thus increasing their coherence, flexibility, effectiveness, and comprehensibility. Apical dendrites of neocortical pyramidal cells have a crucial role in cognition - those dendrites receive input from diverse sources, including feedback, and can amplify the cell's feedforward transmission if relevant in that context. Specialized subsets of inhibitory interneuron regulate this cooperative context-sensitive processing by increasing or decreasing amplification. Apical input has different effects on cellular output depending on whether we are awake, deeply asleep, or dreaming. Furthermore, wakeful thought and imagery may depend on apical input. High-resolution neuroimaging in humans supports and complements evidence on these cellular mechanisms from other mammals.

C. Huang, H. Guan, A. Jiang, Y. Wang, M. Spratling, X. Wang, Y. Zhang (2025) Few-shot anomaly detection via category-agnostic registration learning. IEEE Transactions on Neural Networks and Learning Systems, in press.

PDF

Code

Abstract

Most existing anomaly detection methods require a dedicated model for each category. Such a paradigm, despite its promising results, is computationally expensive and inefficient, thereby failing to meet the requirements for real-world applications. Inspired by how humans detect anomalies, by comparing a query image to known normal ones, this paper proposes a novel few-shot anomaly detection (FSAD) framework. Using a training set of normal images from various categories, registration, aiming to align normal images of the same categories, is leveraged as the proxy task for self-supervised category-agnostic representation learning. At test time, an image and its corresponding support set, consisting of a few normal images from the same category, are supplied, and anomalies are identified by comparing the registered features of the test image to its corresponding support image features. Such a setup enables the model to generalize to novel test categories. It is, to our best knowledge, the first FSAD method that requires no model fine-tuning for novel categories: enabling a single model to be applied to all categories. Extensive experiments demonstrate the effectiveness of the proposed method. Particularly, it improves the current state-of-the-art for FSAD by 11.3% and 8.3% on the MVTec and MPDD benchmarks, respectively.

L. Li, J. Qiu and M. W. Spratling (2025) AROID: improving adversarial robustness through online instance-wise data augmentation. International Journal of Computer Vision (IJCV), 133: 929-50.

PDF

Code

Abstract

Deep neural networks are vulnerable to adversarial examples. Adversarial training (AT) is an effective defense against adversarial examples. However, AT is prone to overfitting which degrades robustness substantially. Recently, data augmentation (DA) was shown to be effective in mitigating robust overfitting if appropriately designed and optimized for AT. This work proposes a new method to automatically learn online, instance-wise, DA policies to improve robust generalization for AT. A novel policy learning objective, consisting of Vulnerability, Affinity and Diversity, is proposed and shown to be sufficiently effective and efficient to be practical for automatic DA generation during AT. This allows our method to efficiently explore a large search space for a more effective DA policy and evolve the policy as training progresses. Empirically, our method is shown to outperform or match all competitive DA methods across various model architectures (CNNs and ViTs) and datasets (CIFAR10, SVHN and Imagenette). Our DA policy reinforced vanilla AT to surpass several state-of-the-art AT methods (with baseline DA) in terms of both accuracy and robustness. It can also be combined with those advanced AT methods to produce a further boost in robustness.

J. Ning, M. W. Spratling and L. Gionfrida (2024) Improving the accuracy of tiny object detection by negative sample copy-paste. Proceedings of the 31st International Conference on Neural Information Processing (ICONIP).

Abstract

Detecting tiny objects is an essential task in the field of computer vision but poses a considerable challenge for existing detectors. One issue is that task-irrelevant objects or non-object background patches can be mistakenly detected as objects of interest, which significantly impairs detector precision. To tackle this issue we include an online image augmentation technique, NegCopyPaste, in the training process. This method copies regions of training images that have been falsely identified as target objects in one epoch and pastes them into the training images to be used in the next epoch. By training the model to reject false-positive predictions made in previous epochs, the proposed method effectively decreases the proportion of false-positive predictions compared to the baselines, making the network more selective in picking out the target objects. NegCopyPaste reduces the number of false-positive predictions during inference and achieves new state-of-the-art results on TinyPerson, WiderFace and DOTA, notably improving mAPtiny by 1.58% over the previous best method on TinyPerson.

M. Fontana, M. W. Spratling and M. Shi (2024) When multi-task learning meets partial supervision: a computer vision review. Proceedings of the IEEE, 112(6): 516-543.

PDF

Abstract

Multi-Task Learning (MTL) aims to learn multiple tasks simultaneously while exploiting their mutual relationships. By using shared resources to simultaneously calculate multiple outputs, this learning paradigm has the potential to have lower memory requirements and inference times compared to the traditional approach of using separate methods for each task. Previous work in MTL has mainly focused on fully-supervised methods, as task relationships can not only be leveraged to lower the level of data-dependency of those methods but they can also improve performance. However, MTL introduces a set of challenges due to a complex optimisation scheme and a higher labeling requirement. This review focuses on how MTL could be utilised under different partial supervision settings to address these challenges. First, this review analyses how MTL traditionally uses different parameter sharing techniques to transfer knowledge in between tasks. Second, it presents the different challenges arising from such a multi-objective optimisation scheme. Third, it introduces how task groupings can be achieved by analysing task relationships. Fourth, it focuses on how partially supervised methods applied to MTL can tackle the aforementioned challenges. Lastly, this review presents the available datasets, tools and benchmarking results of such methods.

N. Manchev and M. W. Spratling (2024) Learning multi-modal recurrent neural networks with target propagation. Computational Intelligence, 40(4):e12691.

PDF

Code

Abstract

Modelling one-to-many type mappings in problems with a temporal component can be challenging. Backpropagation is not applicable to networks that perform discrete sampling and is also susceptible to gradient instabilities, especially when applied to longer sequences. In this paper we propose two recurrent neural network architectures that leverage stochastic units and mixture models, and are trained with target propagation. We demonstrate that these networks can model complex conditional probability distributions, outperform backpropagation-trained alternatives, and do not rapidly degrade with increased time horizons. Our main contributions consist of the design and evaluation of the architectures that enable the networks to solve multi-model problems with a temporal dimension. This also includes the extension of the target propagation through time algorithm to handle stochastic neurons. The use of target propagation provides an additional computational advantage, which enables the network to handle time horizons that are substantially longer compared to networks fitted using backpropagation.

L. Li, Y. Wang, C. Sitawarin and M. W. Spratling (2024) OODRobustBench: a benchmark and large-scale analysis of adversarial robustness under distribution shift. Proceedings of the 41st International Conference on Machine Learning (ICML), in Proceedings of Machine Learning Research, 235:28830-69.

PDF

Code

Abstract

Existing works have made great progress in improving adversarial robustness, but typically test their method only on data from the same distribution as the training data, i.e. in-distribution (ID) testing. As a result, it is unclear how such robustness generalizes under input distribution shifts, i.e. out-of-distribution (OOD) testing. This is a concerning omission as such distribution shifts are unavoidable when methods are deployed in the wild. To address this issue we propose a benchmark named OODRobustBench to comprehensively assess OOD adversarial robustness using 23 dataset-wise shifts (i.e. naturalistic shifts in input distribution) and 6 threat-wise shifts (i.e., unforeseen adversarial threat models). OODRobustBench is used to assess 706 robust models using 60.7K adversarial evaluations. This large-scale analysis shows that: 1) adversarial robustness suffers from a severe OOD generalization issue; 2) ID robustness correlates strongly with OOD robustness in a positive linear way. The latter enables the prediction of OOD robustness from ID robustness. We then predict and verify that existing methods are unlikely to achieve high OOD robustness. Novel methods are therefore required to achieve OOD robustness beyond our prediction. To facilitate the development of these methods, we investigate a wide range of techniques and identify several promising directions.

L. Li, H. Guan, J. Qiu and M. W. Spratling (2024) One prompt word is enough to boost adversarial robustness for pre-trained vision-language models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 24408-19.

PDF

Code

Abstract

Large pre-trained Vision-Language Models (VLMs) like CLIP, despite having remarkable generalization ability, are highly vulnerable to adversarial examples. This work studies the adversarial robustness of VLMs from the novel perspective of the text prompt instead of the extensively studied model weights (frozen in this work). We first show that the effectiveness of both adversarial attack and defense are sensitive to the used text prompt. Inspired by this, we propose a method to improve resilience to adversarial attacks by learning a robust text prompt for VLMs. The proposed method, named Adversarial Prompt Tuning (APT), is effective while being both computationally and data efficient. Extensive experiments are conducted across 15 datasets and 4 data sparsity schemes (from 1-shot to full training data settings) to show APT's superiority over hand-engineered prompts and other state-of-the-art adaption methods. APT demonstrated excellent abilities in terms of the in-distribution performance and the generalization under input distribution shift and across datasets. Surprisingly, by simply adding one learned word to the prompts, APT can significantly boost the accuracy and robustness () over the hand-engineered prompts by +13% and +8.5% on average respectively. The improvement further increases, in our most effective setting, to +26.4% for accuracy and +16.7% for robustness.

S. N. Eddine, T. Brothers, L. Wang, M. Spratling, and G. R. Kuperberg (2024) A predictive coding model of the N400. Cognition, 246: 105755.

PDF

Abstract

The N400 event-related component has been widely used to investigate the neural mechanisms underlying real-time language comprehension. However, despite decades of research, there is still no unifying theory that can explain both its temporal dynamics and functional properties. In this work, we show that predictive coding - a biologically plausible algorithm for approximating Bayesian inference - offers a promising framework for characterizing the N400. Using an implemented predictive coding computational model, we demonstrate how the N400 can be formalized as the lexico-semantic prediction error produced as the brain infers meaning from linguistic form of incoming words. We show that the magnitude of lexico-semantic prediction error mirrors the functional sensitivity of the N400 to various lexical variables, priming, contextual effects, as well as their higher-order interactions. We further show that the dynamics of the predictive coding algorithm provide a natural explanation for the temporal dynamics of the N400, and a biologically plausible link to neural activity. Together, these findings directly situate the N400 within the broader context of predictive coding research, and suggest that the brain may use the same computational mechanism for inference across linguistic and non-linguistic domains.

H. Guan and M. W. Spratling (2024) Query semantic reconstruction for background in few-shot segmentation. The Visual Computer, 40:799-810.

PDF

Abstract

Few-shot segmentation (FSS) aims to segment unseen classes using a few annotated samples. Typically, a prototype representing the foreground class is extracted from annotated support image(s) and is matched to features representing each pixel in the query image. However, models learnt in this way are insufficiently discriminatory, and often produce false positives: misclassifying background pixels as foreground. Some FSS methods try to address this issue by using the background in the support image(s) to help identify the background in the query image. However, the backgrounds of theses images is often quite distinct, and hence, the support image background information is uninformative. This article proposes a method, QSR, that extracts the background from the query image itself, and as a result is better able to discriminate between foreground and background features in the query image. This is achieved by modifying the training process to associate prototypes with class labels including known classes from the training data and latent classes representing unknown background objects. This class information is then used to extract a background prototype from the query image. To successfully associate prototypes with class labels and extract a background prototype that is capable of predicting a mask for the background regions of the image, the machinery for extracting and using foreground prototypes is induced to become more discriminative between different classes. Experiments achieves state-of-the-art results for both 1-shot and 5-shot FSS on the PASCAL-5i and COCO-20i dataset. As QSR operates only during training, results are produced with no extra computational complexity during testing.

J. Ning and M. W. Spratling (2024) The importance of anti-aliasing in tiny object detection. Proceedings of the 15th Asian Conference on Machine Learning (ACML), in Proceedings of Machine Learning Research, 222:975-90.

PDF

Code

Abstract

Tiny object detection has gained considerable attention in the research community owing to the frequent occurrence of tiny objects in numerous critical real-world scenarios. However, convolutional neural networks (CNNs) used as the backbone for object detection architectures typically neglect Nyquist's sampling theorem during down-sampling operations, resulting in aliasing and degraded performance. This is likely to be a particular issue for tiny objects that occupy very few pixels and therefore have high spatial frequency features. This paper applied an existing approach WaveCNet for anti-aliasing to tiny object detection. WaveCNet addresses aliasing by replacing standard down-sampling processes in CNNs with Wavelet Pooling (WaveletPool) layers, effectively suppressing aliasing. We modify the original WaveCNet to apply WaveletPool in a consistent way in the residual blocks of ResNets. Additionally, we also propose a bottom-heavy version of the backbone, which further improves the performance of tiny object detection while also reducing the required number of parameters by almost a half. Experimental results on the TinyPerson, WiderFace, and DOTA datasets demonstrate the effectiveness of our method in detecting tiny objects: the proposed method achieves new state-of-the-art results on all three datasets.

M. W. Spratling (2023) A comprehensive assessment benchmark for rigorously evaluating deep learning image classifiers. arXiv:2308.04137.

PDF

Code

Abstract

Reliable and robust evaluation methods are a necessary first step towards developing machine learning models that are themselves robust and reliable. Unfortunately, current evaluation protocols typically used to assess classifiers fail to comprehensively evaluate performance as they tend to rely on limited types of test data, and ignore others. For example, using the standard test data fails to evaluate the predictions made by the classifier to samples from classes it was not trained on. On the other hand, testing with data containing samples from unknown classes fails to evaluate how well the classifier can predict the labels for known classes. This article advocates bench-marking performance using a wide range of different types of data and using a single metric that can be applied to all such data types to produce a consistent evaluation of performance. Using such a benchmark it is found that current deep neural networks, including those trained with methods that are believed to produce state-of-the-art robustness, are extremely vulnerable to making mistakes on certain types of data. This means that such models will be unreliable in real-world scenarios where they may encounter data from many different domains, and that they are insecure as they can easily be fooled into making the wrong decisions. It is hoped that these results will motivate the wider adoption of more comprehensive testing methods that will, in turn, lead to the development of more robust machine learning methods in the future.

M. W. Spratling (2023) Comprehensive assessment methods are key to progress in deep learning [commentary]. Behavioral and Brain Sciences, 46:e407.

PDF

L. Li and M. W. Spratling (2023) Data augmentation alone can improve adversarial training. Proceedings of the 11th International Conference on Learning Representations (ICLR).

PDF

Code

Abstract

Adversarial training suffers from the issue of robust overfitting, which seriously impairs its generalization performance. Data augmentation, which is effective at preventing overfitting in standard training, has been observed by many previous works to be ineffective in mitigating overfitting in adversarial training. This work proves that, contrary to previous findings, data augmentation alone can significantly boost accuracy and robustness in adversarial training. We find that the hardness and the diversity of data augmentation are important factors in combating robust overfitting. In general, diversity can improve both accuracy and robustness, while hardness can boost robustness at the cost of accuracy within a certain limit and degrade them both over that limit. To mitigate robust overfitting, we first propose a new crop transformation Cropshift with improved diversity compared to the conventional one (Padcrop). We then propose a new data augmentation scheme, based on Cropshift, with much improved diversity and well-balanced hardness. Empirically, our augmentation method achieves the state-of-the-art accuracy and robustness for data augmentations in adversarial training. Furthermore, it matches, or even exceeds when combined with weight averaging, the performance of the best contemporary regularization methods for alleviating robust overfitting.

J. Ning, H. Guan and M. W. Spratling (2023) Rethinking the backbone architecture for tiny object detection. Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP), Volume 5, pp. 103-114.

PDF

Code

Abstract

Tiny object detection has become an active area of research because images with tiny targets are common in several important real-world scenarios. However, existing tiny object detection methods use standard deep neural networks as their backbone architecture. We argue that such backbones are inappropriate for detecting tiny objects as they are designed for the classification of larger objects, and do not have the spatial resolution to identify small targets. Specifically, such backbones use max-pooling or a large stride at early stages in the architecture. This produces lower resolution feature-maps that can be efficiently processed by subsequent layers. However, such low-resolution feature-maps do not contain information that can reliably discriminate tiny objects. To solve this problem we design "bottom-heavy" versions of backbones that allocate more resources to processing higher-resolution features without introducing any additional computational burden overall. We also investigate if pre-training these backbones on images of appropriate size, using CIFAR100 and ImageNet32, can further improve performance on tiny object detection. Results on TinyPerson and WiderFace show that detectors with our proposed backbones achieve better results than the current state-of-the-art methods.

L. Li and M. W. Spratling (2023) Understanding and combating robust overfitting via input loss landscape analysis and regularization. Pattern Recognition, 136 (109229).

PDF

Code

Abstract

Adversarial training is widely used to improve the robustness of deep neural networks to adversarial attack. However, adversarial training is prone to overfitting, and the cause is far from clear. This work sheds light on the mechanisms underlying overfitting through analyzing the loss landscape w.r.t. the input. We find that robust overfitting results from standard training, specifically the minimization of the clean loss, and can be mitigated by regularization of the loss gradients. Moreover, we find that robust overfitting turns severer during adversarial training partially because the gradient regularization effect of adversarial training becomes weaker due to the increase in the loss landscape's curvature. To improve robust generalization, we propose a new regularizer to smooth the loss landscape by penalizing the weighted logits variation along the adversarial direction. Our method significantly mitigates robust overfitting and achieves the highest robustness and efficiency compared to similar previous methods.

B. Gao and M. W. Spratling (2023) Explaining away results in more robust visual tracking. The Visual Computer, 39:2081-95.

PDF

Code

Abstract

Many current trackers utilise an appearance model to localise the target object in each frame. However, such approaches often fail when there are similar looking distractor objects in the surrounding background, meaning that target appearance alone is insufficient for robust tracking. In contrast, humans consider the distractor objects as additional visual cues, in order to infer the position of the target. Inspired by this observation, this paper proposes a novel tracking architecture in which not only is the appearance of the tracked object, but also the appearance of the distractors detected in previous frames, taken into consideration using a form of probabilistic inference known as explaining away. This mechanism increases the robustness of tracking by making it more likely that the target appearance model is matched to the true target, rather than similar-looking regions of the current frame. The proposed method can be combined with many existing trackers. Combining it with SiamFC, DaSiamRPN, Super DiMP and ARSuper DiMP all resulted in an increase in the tracking accuracy compared to that achieved by the underlying tracker alone. When combined with Super DiMP and ARSuper DiMP the resulting trackers produce performance that is competitive with the state-of-the-art on seven popular benchmarks.

B. Gao and M. W. Spratling (2022) Shape-texture debiased training for robust template matching. Sensors, 22(17): 6658.

PDF

Code

Abstract

Finding a template in a search image is an important task underlying many computer vision applications. This is typical solved by calculating a similarity map using features extracted from the separate images. Recent approaches perform template matching in a deep feature-space, produced by a convolutional neural network (CNN), which is found to provide more tolerance to changes in appearance. Inspired by these findings, in this article we investigate if enhancing the CNN's encoding of shape information can produce more distinguishable features that improve the performance of template matching. By comparing features from a same CNN but trained by different shape-texture training methods, we determined a feature-space which improves the performance of most template matching algorithms. When combining the proposed method with the Divisive Input Modulation (DIM) template matching algorithm, its performance is greatly improved, and the resulting method produces state-of-the-art results on a standard benchmark. To confirm these results we also create a new benchmark and show that the proposed method also outperforms existing techniques on this new dataset.

N. Manchev and M. W. Spratling (2022) On the biological plausibility of orthogonal initialisation for solving gradient instability in deep neural networks. Proceedings of the 9th International Conference on Soft Computing and Machine Intelligence (ISCMI), pp. 47-55.

PDF

Code

Abstract

Initialising the synaptic weights of artificial neural networks (ANNs) with orthogonal matrices is known to alleviate vanishing and exploding gradient problems. A major objection against such initialisation schemes is that they are deemed biologically implausible as they mandate factorization techniques that are difficult to attribute to a neurobiological process. This paper presents two initialisation schemes that allow a network to naturally evolve its weights to form orthogonal matrices, provides theoretical analysis that pre-training orthogonalisation always converges, and empirically confirms that the proposed schemes outperform randomly initialised recurrent and feedforward networks.

C. Huang, H. Guan, A. Jiang, Y. Zhang, M. W. Spratling and Y.-F. Wang (2022) Registration based few-shot anomaly detection. European Conference on Computer Vision (ECCV), In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Proceedings part 24, Lecture Notes in Computer Science, Volume 13684, pp:303-19, Springer.

PDF

Code

Abstract

This paper considers few-shot anomaly detection (FSAD), a practical yet under-studied setting for anomaly detection (AD), where only a limited number of normal images are provided for each category at training. So far, existing FSAD studies follow the one-model-per-category learning paradigm used for standard AD, and the inter-category commonality has not been explored. Inspired by how humans detect anomalies, i.e., comparing an image in question to normal images, we here leverage registration, an image alignment task that is inherently generalizable across categories, as the proxy task, to train a category-agnostic anomaly detection model. During testing, the anomalies are identified by comparing the registered features of the test image and its corresponding support (normal) images. As far as we know, this is the first FSAD method that trains a single generalizable model and requires no re-training or parameter fine-tuning for new categories. Experimental results have shown that the proposed method outperforms the state-of-the-art FSAD methods by 3%-8% in AUC on the MVTec and MPDD benchmarks. Source code will be publicly available.

H. Guan and M. W. Spratling (2022) CobNet: cross attention on object and background for few-shot segmentation. Proceedings of the 26th International Conference on Pattern Recognition (ICPR), pp. 39-45.

PDF

Abstract

Few-shot segmentation aims to segment images containing objects from previously unseen classes using only a few annotated samples. Most current methods focus on using object information extracted, with the aid of human annotations, from support images to identify the same objects in new query images. However, background information can also be useful to distinguish objects from their surroundings. Hence, some previous methods also extract background information from the support images. In this paper, we argue that such information is of limited utility, as the background in different images can vary widely. To overcome this issue, we propose CobNet which utilises information about the background that is extracted from the query images without annotations of those images. Experiments show that our method achieves a mean Intersection-over-Union score of 61.4% and 37.8% for 1-shot segmentation on PASCAL-5i and COCO-20i respectively, outperforming previous methods. It is also shown to produce state-of-the-art performances of 53.7% for weakly-supervised few-shot segmentation, where no annotations are provided for the support images.

B. Gao and M. W. Spratling (2022) More robust object tracking via shape and motion cue integration. Signal Processing, 199 (108628).

PDF

Code

Abstract

Most current trackers utilise an appearance model to localise the target object in each frame. However, such approaches often fail when there are similar looking distractor objects in the surrounding background. This paper promotes an approach that can be combined with many existing trackers to tackle this issue and improve tracking robustness. The proposed approach makes use of two additional cues to target location: making the appearance model more sensitive to shape cues in offline training, and using the historical locations of the target to predict its future position during online inference. Combining these additional mechanisms with SiamFC, SiamFC++, Super DiMP and ARSuper DiMP all resulted in an increase in the tracking accuracy compared to that achieved by the corresponding underlying tracker alone. When combined with ARSuper DiMP the resulting tracker is shown to outperform all popular state-of-the-art trackers on three benchmark datasets (OTB-100, NFS, and LaSOT), and produce performance that is competitive with the state-of-the-art on the UAV123, Trackingnet, GOT-10K and VOT2020 datasets.

B. Gao and M. W. Spratling (2021) Robust template matching via hierarchical convolutional features from a shape biased CNN. Proceedings of the International Conference on Image, Vision and Intelligent Systems (ICIVIS), Lecture Notes in Electrical Engineering, Volume 813. Springer, Singapore.

PDF

Code

Abstract

Finding a template in a search image is an important task underlying many computer vision applications. Recent approaches perform template matching in a deep feature-space, produced by a convolutional neural network (CNN), which is found to provide more tolerance to changes in appearance. In this article we investigate if enhancing the CNN's encoding of shape information can produce more distinguishable features that improve the performance of template matching. This investigation results in a new template matching method that produces state-of-the-art results on a standard benchmark. To confirm these results we also create a new benchmark and show that the proposed method also outperforms existing techniques on this new dataset.

M. W. Spratling (2020) Explaining away results in accurate and tolerant template matching. Pattern Recognition, 104 (107337).

PDF

Code

Abstract

Recognising and locating image patches or sets of image features is an important task underlying much work in computer vision. Traditionally this has been accomplished using template matching. However, template matching is notoriously brittle in the face of changes in appearance caused by, for example, variations in viewpoint, partial occlusion, and non-rigid deformations. This article tests a method of template matching that is more tolerant to such changes in appearance and that can, therefore, more accurately identify image patches. In traditional template matching the comparison between a template and the image is independent of the other templates. In contrast, the method advocated here takes into account the evidence provided by the image for the template at each location and the full range of alternative explanations represented by the same template at other locations and by other templates. Specifically, the proposed method of template matching is performed using a form of probabilistic inference known as "explaining away". The algorithm used to implement explaining away has previously been used to simulate several neurobiological mechanisms, and been applied to image contour detection and pattern recognition tasks. Here it is applied for the first time to image patch matching, and is shown to produce superior results in comparison to the current state-of-the-art methods.

N. Manchev and M. W. Spratling (2020) Target propagation in recurrent neural networks. Journal of Machine Learning Research, 21(7): 1-33.

PDF

Code

Abstract

Recurrent Neural Networks have been widely used to process sequence data, but have long been criticized for their biological implausibility and training difficulties related to vanishing and exploding gradients. This paper presents a novel algorithm for training recurrent networks, target propagation through time (TPTT), that outperforms standard backpropagation through time (BPTT) on four out of the five problems used for testing. The proposed algorithm is initially tested and compared to BPTT on four synthetic time lag tasks, and its performance is also measured using the sequential MNIST data set. In addition, as TPTT uses target propagation, it allows for discrete nonlinearities and could potentially mitigate the credit assignment problem in more complex recurrent architectures.

M. W. Spratling (2019) Fitting predictive coding to the neurophysiological data. Brain Research, 1720: 146313.

PDF

Code

Abstract

Recent neurophysiological data showing the effects of locomotion on neural activity in mouse primary visual cortex has been interpreted as providing strong support for the predictive coding account of cortical function. Specifically, this work has been interpreted as providing direct evidence that prediction-error, a distinguishing property of predictive coding, is encoded in cortex. This article evaluates these claims and highlights some of the discrepancies between the proposed predictive coding model and the neuro-biology. Furthermore, it is shown that the model can be modified so as to fit the empirical data more successfully.

I. E. Kartoglu and M. W. Spratling (2018) Two collaborative filtering recommender systems based on sparse dictionary coding. Knowledge and Information Systems, 57(3): 709-20.

PDF

Code

Abstract

This paper proposes two types of recommender systems based on sparse dictionary coding. Firstly, a novel predictive recommender system that attempts to predict a user's future rating of a specific item. Secondly, a top-n recommender system which finds a list of items predicted to be most relevant for a given user. The proposed methods are assessed using a variety of different metrics and are shown to be competitive with existing collaborative filtering recommender systems. Specifically, the sparse dictionary-based predictive recommender has advantages over existing methods in terms of a lower computational cost and not requiring parameter tuning. The sparse dictionary-based top-n recommender system has advantages over existing methods in terms of the accuracy of the predictions it makes and not requiring parameter tuning. An open-source software implemented and used for the evaluation in this paper is also provided for reproducibility.

Q. Wang and M. W. Spratling (2018) Contour detection refined by a sparse reconstruction-based discrimination method. Signal, Image and Video Processing, 12(2): 207-14.

PDF

Abstract

Sparse representations have been widely used for many image processing tasks. In this paper, a sparse reconstruction-based discrimination (SRBD) method, which was previously proposed for the classification of image patches, is utilized to improve boundary detection in colour images. This method is applied to refining the results generated by three different algorithms: a biologically-inspired method, and two state-of-the-art algorithms for contour detection. All of the contour detection results are evaluated by the BSDS300 and BSDS500 benchmarks using the quantitative measures: F-score, ODS, OIS and AP. Evaluation results show that the performance of each algorithm is improved using the proposed method of refinement with at least one of the quantitative measures increased by 0.01. In particularly, even two state-of-the-art algorithms are slightly improved by applying the SRBD method to refine their contour detection results.

M. W. Spratling (2017) A predictive coding model of gaze shifts and the underlying neurophysiology. Visual Cognition, 25(7-8): 770-801.

PDF

Code

Abstract

A comprehensive model of gaze control must account for a number of empirical observations at both the behavioural and neurophysiological levels. The computational model presented in this article can simulate the coordinated movements of the eye, head, and body required to perform horizontal gaze shifts. In doing so it reproduces the predictable relationships between the movements performed by these different degrees of freedom (DOFs) in the primate. The model also accounts for the saccadic undershoot that accompanies large gaze shifts in the biological visual system. It can also account for our perception of a stable external world despite frequent gaze shifts and the ability to perform accurate memory-guided and double-step saccades. The proposed model also simulates peri-saccadic compression: the mis-localisation of a briefly presented visual stimulus towards the location that is the target for a saccade. At the neurophysiological level, the proposed model is consistent with the existence of cortical neurons tuned to the retinal, head-centred, body-centred, and world-centred locations of visual stimuli and cortical neurons that have gain-modulated responses to visual stimuli. Finally, the model also successfully accounts for peri-saccadic receptive field (RF) remapping which results in reduced responses to stimuli in the current RF location and an increased sensitivity to stimuli appearing at the location that will be occupied by the RF after the saccade. The proposed model thus offers a unified explanation for this seemingly diverse range of phenomena. Furthermore, as the proposed model is an implementation of the predictive coding theory, it offers a single computational explanation for these phenomena and relates gaze shifts to a wider framework for understanding cortical function.

W. Muhammad and M. W. Spratling (2017) A neural model for eye-head-arm coordination. Advanced Robotics, 31(12): 650-63.

PDF

Abstract

The coordinated movement of the eyes, the head and the arm is an important ability in both animals and humanoid robots. To achieve this the brain and the robot control system need to be able to perform complex non-linear sensory-motor transformations in the forward and inverse directions between many degrees of freedom. In this article we apply an omni- directional basis function neural network to this task. The proposed network can perform 3-D coordinated gaze shifts and 3-D arm reach movements to a visual target. Particularly, it can perform direct sensory-motor transformations to shift gaze and to execute arm reach movements and can also perform inverse sensory-motor transformations in order to shift gaze to view the hand.

M. W. Spratling (2017) A hierarchical predictive coding model of object recognition in natural images. Cognitive Computation, 9(2): 151-67.

PDF

Code

Abstract

Predictive coding has been proposed as a model of the hierarchical perceptual inference process performed in the cortex. However, results demonstrating that predictive coding is capable of performing the complex inference required to recognise objects in natural images have not previously been presented. This article proposes a hierarchical neural network based on predictive coding for performing visual object recognition. This network is applied to the tasks of categorising hand-written digits, identifying faces, and locating cars in images of street scenes. It is shown that image recognition can be performed with tolerance to position, illumination, size, partial occlusion and within-category variation. The current results, therefore, provide the first practical demonstration that predictive coding (at least the particular implementation of predictive coding used here; the PC/BC-DIM) is capable of performing accurate visual object recognition.

D. Re, A. Gibaldi, S. P. Sabatini and M. W. Spratling (2017) An integrated system based on binocular learned receptive fields for saccade-vergence on visually salient targets. Proceedings of the 12th International Conference on Computer Vision Theory and Applications (VISAPP), Volume 6, pp:204-15.

PDF

Abstract

The human visual system uses saccadic and vergence eyes movements to foveate interesting objects with both eyes, and thus exploring the visual scene. To mimic this biological behavior in active vision, we proposed a bio-inspired integrated system able to learn a functional sensory representation of the environment, together with the motor commands for binocular eye coordination, directly by interacting with the environment itself. The proposed architecture, rather than sequentially combining different functionalities, is a robust integration of different modules that rely on a front-end of learned binocular receptive fields to specialize on different sub-tasks. The resulting modular architecture is able to detect salient targets in the scene and perform precise binocular saccadic and vergence movement on it. The performances of the proposed approach has been tested on the iCub Simulator, providing a quantitative evaluation of the computational potentiality of the learned sensory and motor resources.

M. W. Spratling (2017) A review of predictive coding algorithms. Brain and Cognition, 112: 92-7.

PDF

Abstract

Predictive coding is a leading theory of how the brain performs probabilistic inference. However, there are a number of distinct algorithms which are described by the term "predictive coding". This article provides a concise review of these different predictive coding algorithms, highlighting their similarities and differences. Five algorithms are covered: linear predictive coding which has a long and influential history in the signal processing literature; the first neuroscience-related application of predictive coding to explaining the function of the retina; and three versions of predictive coding that have been proposed to model cortical function. While all these algorithms aim to fit a generative model to sensory data, they differ in the type of generative model they employ, in the process used to optimise the fit between the model and sensory data, and in the way that they are related to neurobiology.

W. Muhammad and M. W. Spratling (2017) A neural model of coordinated head and eye movement control. Journal of Intelligent & Robotic Systems, 85(1):107-26.

PDF

Abstract

Gaze shifts require the coordinated movement of both the eyes and the head in both animals and humanoid robots. To achieve this the brain and the robot control system needs to be able to perform complex non-linear sensory-motor transformations between many degrees of freedom and resolve the redundancy in such a system. In this article we propose a hierarchical neural network model for performing 3-D coordinated gaze shifts. The network is based on the PC/BC-DIM (Predictive Coding/Biased Competition with Divisive Input Modulation) basis function model. The proposed model consists of independent eyes and head controlled circuits with mutual interactions for the appropriate adjustment of coordination behaviour. Based on the initial eyes and head positions the network resolves redundancies involved in 3-D gaze shifts and produces accurate gaze control without any kinematic analysis or imposing any constraints. Furthermore the behaviour of the proposed model is consistent with coordinated eye and head movements observed in primates.

Q. Wang and M. W. Spratling (2016) Contour detection in colour images using a neurophysiologically inspired model. Cognitive Computation, 8(6):1027-35.

PDF

Abstract

The predictive coding/biased competition (PC/BC) model of V1 has previously been applied to locate boundaries defined by local discontinuities in intensity within an image. Here it is extended to perform contour detection for colour images. The proposed extensions are inspired by neurophysiological data from single neurons in macaque primary visual cortex (V1), and the behaviour of this extended model is consistent with the neurophysiological experimental results. Furthermore, when compared to methods used for contour detection in computer vision, the colour PC/BC model of V1 slightly outperforms some recently proposed algorithms which use more cues and/or require a complicated training procedure.

M. W. Spratling (2016) A neural implementation of Bayesian inference based on predictive coding. Connection Science, 28(4):346-83.

PDF

Code

Abstract

Predictive coding is a leading theory of cortical function that has previously been shown to explain a great deal of neurophysiological and psychophysical data. Here it is shown that predictive coding can perform almost exact Bayesian inference when applied to computing with population codes. It is demonstrated that the proposed algorithm, based on predictive coding, can: decode probability distributions encoded as noisy population codes; combine priors with likelihoods to calculate posteriors; perform cue integration and cue segregation; perform function approximation; be extended to perform hierarchical inference; simultaneously represent and reason about multiple stimuli; and perform inference with multi-modal and non-Gaussian probability distributions. Predictive coding thus provides a neural network based method for performing probabilistic computation and provides a simple, yet comprehensive, theory of how the cerebral cortex performs Bayesian inference.

M. W. Spratling (2016) Predictive coding as a model of cognition. Cognitive Processing, 17(3): 279-305.

PDF

Code

Abstract

Previous work has shown that predictive coding can provide a detailed explanation of a very wide range of low-level perceptual processes. It is also widely believed that predictive coding can account for high-level, cognitive, abilities. This article provides support for this view by showing that predictive coding can simulate phenomena such as categorisation, the influence of abstract knowledge on perception, recall and reasoning about conceptual knowledge, context-dependent behavioural control, and naive physics. The particular implementation of predictive coding used here (PC/BC-DIM) has previously been used to simulate low-level perceptual behaviour and the neural mechanisms that underlie them. This algorithm thus provides a single framework for modelling both perceptual and cognitive brain function.

M. W. Spratling (2016) A neural implementation of the Hough transform and the advantages of explaining away. Image and Vision Computing, 52:15-24.

PDF

Code

Abstract

The Hough Transform (HT) is widely used for feature extraction and object detection. However, during the HT individual image elements vote for many possible parameter values. This results in a dense accumulator array and problems identifying the parameter values that correspond to image features. This article proposes a new method for implementing the voting process in the HT. This method employs a competitive neural network algorithm to perform a form of probabilistic inference known as "explaining away". This results in a sparse accumulator array in which the parameter values of image features can be more accurately identified. The proposed method is initially demonstrated using the simple, prototypical, task of straight line detection in synthetic images. In this task it is shown to more accurately identify straight lines, and the parameter of those lines, compared to the standard Hough voting process. The proposed method is further assessed using a version of the implicit shape model (ISM) algorithm applied to car detection in natural images. In this application it is shown to more accurately identify cars, compared to using the standard Hough voting process in the same algorithm, and compared to the original ISM algorithm.

Q. Wang and M. W. Spratling (2016) A simplified texture gradient method for improved image segmentation. Signal, Image and Video Processing, 10(4):679-86.

PDF

Abstract

Inspired by the probability of boundary (Pb) algorithm, a simplified texture gradient method has been developed to locate texture boundaries within grayscale images. Despite considerable simplification, the proposed algorithm's ability to locate texture boundaries is comparable with Pb's texture boundary method. The proposed texture gradient method is also integrated with a biologically inspired model, to enable boundaries defined by discontinuities in both intensity and texture to be located. The combined algorithm outperforms the current state-of-art image segmentation method (Pb) when this method is also restricted to using only local cues of intensity and texture at a single scale.

W. Muhammad and M. W. Spratling (2015) A neural model of binocular saccade planning and vergence control. Adaptive Behavior, 23(5):265-82.

PDF

Abstract

The human visual system uses saccadic and vergence eye movements to foveate visual targets. To mimic this aspect of the biological visual system the PC/BC-DIM neural network is used as an omni-directional basis function network for learning and performing sensory-sensory and sensory-motor transformations without using any hard-coded geometric information. A hierarchical PC/BC-DIM network is used to learn a head-centred representation of visual targets by dividing the whole problem into independent subtasks. The learnt head- centred representation is then used to generate saccade and vergence motor commands. The performance of the proposed system is tested using the iCub humanoid robot simulator.

M. W. Spratling (2014) Classification using sparse representations: a biologically plausible approach. Biological Cybernetics, 108(1):61-73.

PDF

Code

Abstract

Representing signals as linear combinations of basis vectors sparsely selected from an overcomplete dictionary has proven to be advantageous for many applications in pattern recognition, machine learning, signal processing, and computer vision. While this approach was originally inspired by insights into cortical information processing, biologically-plausible approaches have been limited to exploring the functionality of early sensory processing in the brain, while more practical application have employed non-biologically-plausible sparse-coding algorithms. Here, a biologically-plausible algorithm is proposed that can be applied to practical problems. This algorithm is evaluated using standard benchmark tasks in the domain of pattern classification, and its performance is compared to a wide range of alternative algorithms that are widely used in signal and image processing. The results show that, for the classification tasks performed here, the proposed method is very competitive with the best of the alternative algorithms that have been evaluated. This demonstrates that classification using sparse representations can be performed in a neurally-plausible manner, and hence, that this mechanism of classification might be exploited by the brain.

M. W. Spratling (2014) A single functional model of drivers and modulators in cortex. Journal of Computational Neuroscience, 36(1): 97-118.

PDF

Code

Abstract

A distinction is commonly made between synaptic connections capable of evoking a response ("drivers") and those that can alter ongoing activity but not initiate it ("modulators"). Here it is proposed that, in cortex, both drivers and modulators are an emergent property of the perceptual inference performed by cortical circuits. Hence, it is proposed that there is a single underlying computational explanation for both forms of synaptic connection. This idea is illustrated using a predictive coding model of cortical perceptual inference. In this model all synaptic inputs are treated identically. However, functionally, certain synaptic inputs drive neural responses while others have a modulatory influence. This model is shown to account for driving and modulatory influences in bottom-up, lateral, and top-down pathways, and is used to simulate a wide range of neurophysiological phenomena including surround suppression, contour integration, gain modulation, spatio-temporal prediction, and attention. The proposed computational model thus provides a single functional explanation for drivers and modulators and a unified account of a diverse range of neurophysiological data.

M. W. Spratling (2013) Predictive coding. In Encyclopedia of Computational Neuroscience, D. Jaeger and R. Jung (Eds.), Springer, New York.

PDF

M. W. Spratling (2013) Distinguishing theory from implementation in predictive coding accounts of brain function [commentary]. Behavioral and Brain Sciences, 36(3):231-2.

PDF

M. W. Spratling (2013) Image segmentation using a sparse coding model of cortical area V1. IEEE Transactions on Image Processing, 22(4):1631-43.

PDF

Code

Abstract

Algorithms that encode images using a sparse set of basis functions have previously been shown to explain aspects of the physiology of primary visual cortex (V1), and have been used for applications such as image compression, restoration, and classification. Here, a sparse coding algorithm, that has previously been used to account of the response properties of orientation tuned cells in primary visual cortex, is applied to the task of perceptually salient boundary detection. The proposed algorithm is currently limited to using only intensity information at a single scale. However, it is shown to out-perform the current state-of-the-art image segmentation method (Pb) when this method is also restricted to using the same information.

K. De Meyer and M. W. Spratling (2013) A model of partial reference frame transforms through pooling of gain-modulated responses. Cerebral Cortex, 23(5):1230-9.

PDF

Code

Abstract

In multimodal integration and sensorimotor transformation areas of posterior parietal cortex (PPC), neural responses often appear encoded in spatial reference frames that are intermediate to intrinsic sensory reference frames, e.g., eye-centred for visual or head-centred for auditory stimulation. Many sensory responses in these areas are also modulated by direction of gaze. We demonstrate that certain types of mixed-frame responses can be generated by pooling gain-modulated responses - similarly to how complex cells in visual cortex are thought to pool the responses of simple cells. The proposed model simulates two types of mixed-frame responses observed in PPC: in particular, sensory responses that shift differentially with gaze in horizontal and vertical dimensions; and sensory responses that shift differentially for different start and end points along a single dimension of gaze. We distinguish these two types of mixed-frame responses from a third type in which sensory responses shift a partial yet approximately equal amount with each gaze shift. We argue that the empirical data on mixed-frame responses may be caused by multiple mechanisms, and we adapt existing reference-frame measures to distinguish between the different types. Finally, we discuss how mixed-frame responses may be revealing of the local organisation of presynaptic responses.

M. W. Spratling (2012) Predictive coding accounts for V1 response properties recorded using reverse correlation. Biological Cybernetics, 106(1):37-49.

PDF

Code

Abstract

PC/BC ("Predictive Coding/Biased Competition") is a simple computational model that has previously been shown to explain a very wide range of V1 response properties. This article extends work on the PC/BC model of V1 by showing that it can also account for V1 response properties measured using the reverse correlation methodology. Reverse correlation employs an experimental procedure that is significantly different from that used in more typical neurophysiological experiments, and measures some distinctly different response properties in V1. Despite these differences PC/BC successfully accounts for the data. The current results thus provide additional support for the PC/BC model of V1 and further demonstrate that PC/BC offers a unified explanation for the seemingly diverse range of behaviours observed in primary visual cortex.

M. W. Spratling (2012) Predictive coding as a model of the V1 saliency map hypothesis. Neural Networks, 26:7-28.

PDF

Code

Abstract

The predictive coding/biased competition (PC/BC) model is a specific implementation of predictive coding theory that has previously been shown to provide a detailed account of the response properties of orientation tuned cells in primary visual cortex (V1). Here it is shown that the same model can successfully simulate psychophysical data relating to the saliency of unique items in search arrays, of contours embedded in random texture, and of borders between textured regions. This model thus provides a possible implementation of the hypothesis that V1 generates a bottom-up saliency map. However, PC/BC is very different from previous models of visual salience, in that it proposes that saliency results from the failure of an internal model of simple elementary image components to accurately predict the visual input. Saliency can therefore be interpreted as a mechanism by which prediction errors attract attention in an attempt to improve the accuracy of the brain's internal representation of the world.

M. W. Spratling (2012) Unsupervised learning of generative and discriminative weights encoding elementary image components in a predictive coding model of cortical function. Neural Computation, 24(1): 60-103.

PDF

Code

Abstract

A method is presented for learning the reciprocal feedforward and feedback connections required by the predictive coding model of cortical function. Using this method feedforward and feedback connections are learnt simultaneously and independently in a biologically plausible manner. The performance of the proposed algorithm is evaluated by applying it to learning the elementary components of artificial images and of natural images. For artificial images the bars problem is employed and the proposed algorithm is shown to produce state-of-the-art performance on this task. For natural images, components resembling Gabor functions are learnt in the first processing stage and neurons responsive to corners are learnt in the second processing stage. The properties of these learnt representations are in good agreement with neurophysiological data from V1 and V2. The proposed algorithm demonstrates for the first time that a single computational theory can explain the formation of cortical RFs, and also the response properties of cortical neurons once those RFs have been learnt.

K. De Meyer and M. W. Spratling (2011) Multiplicative gain modulation arises through unsupervised learning in a predictive coding model of cortical function. Neural Computation, 23(6):1536-67.

PDF

Code

Abstract

The combination of two or more population-coded signals in a neural model of predictive coding can give rise to multiplicative gain modulation in the response properties of individual neurons. Synaptic weights generating these multiplicative response properties can be learned using an unsupervised, Hebbian, learning rule. The behaviour of the model is compared to empirical data on gaze-dependent gain modulation of cortical cells, and found to be in good agreement with a range of physiological observations. Furthermore, it is demonstrated that the model can learn to represent a set of basis functions. The current paper thus connects an often-observed neurophysiological phenomenon and important neurocomputational principle (gain modulation) with an influential theory of brain operation (predictive coding).

M. W. Spratling (2011) A single functional model accounts for the distinct properties of suppression in cortical area V1. Vision Research, 51(6):563-76.

PDF

Code

Abstract

Cross-orientation suppression and surround suppression have been extensively studied in primary visual cortex (V1). These two forms of suppression have some distinct properties which has led to the suggestion that they are generated by different underlying mechanisms. Furthermore, it has been suggested that mechanisms other than intracortical inhibition may be central to both forms of suppression. A simple computational model (PC/BC), in which intracortical inhibition is fundamental, is shown to simulate the distinct properties of cross-orientation and surround suppression. The same model has previously been shown to account for a large range of V1 response properties including orientation-tuning, spatial and temporal frequency tuning, facilitation and inhibition by flankers and textured surrounds as well as a range of other experimental results on cross-orientation suppression and surround suppression. The current results thus provide additional support for the PC/BC model of V1 and for the proposal that the diverse range of response properties observed in V1 neurons have a single computational explanation. Furthermore, these results demonstrate that current neurophysiological evidence is insufficient to discount intracortical inhibition as a central mechanism underlying both forms of suppression.

M. W. Spratling (2010) Predictive coding as a model of response properties in cortical area V1. Journal of Neuroscience, 30(9):3531-43.

PDF

Code

Abstract

A simple model is shown to account for a large range of V1 classical, and non-classical, receptive field properties including orientation-tuning, spatial and temporal frequency tuning, cross-orientation suppression, surround suppression, and facilitation and inhibition by flankers and textured surrounds. The model is an implementation of the predictive coding theory of cortical function and thus provides a single computational explanation for a diverse range of neurophysiological findings. Furthermore, since predictive coding can be related to the biased competition theory and is a specific example of more general theories of hierarchical perceptual inference the current results relate V1 response properties to a wider, more unified, framework for understanding cortical function.

M. W. Spratling (2009) Learning posture invariant spatial representations through temporal correlations. IEEE Transactions on Autonomous Mental Development, 1(4):253-63.

PDF

Abstract

A hierarchical neural network model is used to learn, without supervision, sensory-sensory coordinate transformations like those believed to be encoded in the dorsal pathway of the cerebral cortex. The resulting representations of visual space are invariant to eye orientation, neck orientation, or posture in general. These posture invariant spatial representations are learned using the same mechanisms that have previously been proposed to operate in the cortical ventral pathway to learn object representation that are invariant to translation, scale, orientation, or viewpoint in general. This model thus suggests that the same mechanisms of learning and development operate across multiple cortical hierarchies.

K. De Meyer and M. W. Spratling (2009) A model of non-linear interactions between cortical top-down and horizontal connections explains the attentional gating of collinear facilitation. Vision Research, 49(5):553-68.

PDF

Abstract

Past physiological and psychophysical experiments have shown that attention can modulate the effects of contextual information appearing outside the classical receptive field of a cortical neuron. Specifically, it has been suggested that attention, operating via cortical feedback connections, gates the effects of long-range horizontal connections underlying collinear facilitation in cortical area V1. This article proposes a novel mechanism, based on the computations performed within the dendrites of cortical pyramidal cells, that can account for these observations. Furthermore, it is shown that the top-down gating signal into V1 can result from a process of biased competition occurring in extrastriate cortex. A model based on these two assumptions is used to replicate the results of physiological and psychophysical experiments on collinear facilitation and attentional modulation.

M. W. Spratling, K. De Meyer and R. Kompass (2009) Unsupervised learning of overlapping image components using divisive input modulation. Computational Intelligence and Neuroscience, 2009(381457):1-19.

PDF

Code

Abstract

This paper demonstrates that non-negative matrix factorisation is mathematically related to a class of neural networks that employ negative feedback as a mechanism of competition. This observation inspires a novel learning algorithm which we call Divisive Input Modulation (DIM). The proposed algorithm provides a mathematically simple and computationally efficient method for the unsupervised learning of image components, even in conditions where these elementary features overlap considerably. To test the proposed algorithm, a novel artificial task is introduced which is similar to the frequently-used bars problem but employs squares rather than bars to increase the degree of overlap between components. Using this task, we investigate how the proposed method performs on the parsing of artificial images composed of overlapping features, given the correct representation of the individual components; and secondly, we investigate how well it can learn the elementary components from artificial training images. We compare the performance of the proposed algorithm with its predecessors including variations on these algorithms that have produced state-of-the-art performance on the bars problem. The proposed algorithm is more successful than its predecessors in dealing with overlap and occlusion in the artificial task that has been used to assess performance.

M. W. Spratling (2008) Predictive coding as a model of biased competition in visual attention. Vision Research, 48(12):1391-408.

PDF

Code

Abstract

Attention acts, through cortical feedback pathways, to enhance the response of cells encoding expected or predicted information. Such observations are inconsistent with the predictive coding theory of cortical function which proposes that feedback acts to suppress information predicted by higher-level cortical regions. Despite this discrepancy, this article demonstrates that the predictive coding model can be used to simulate a number of the effects of attention. This is achieved via a simple mathematical rearrangement of the predictive coding model, which allows it to be interpreted as a form of biased competition model. Nonlinear extensions to the model are proposed that enable it to explain a wider range of data.

M. W. Spratling (2008) Reconciling predictive coding and biased competition models of cortical function. Frontiers in Computational Neuroscience, 2(4):1-8.

PDF

Abstract

A simple variation of the standard biased competition model is shown, via some trivial mathematical manipulations, to be identical to predictive coding. Specifically, it is shown that a particular implementation of the biased competition model, in which nodes compete via inhibition that targets the inputs to a cortical region, is mathematically equivalent to the linear predictive coding model. This observation demonstrates that these two important and influential rival theories of cortical function are minor variations on the same underlying mathematical model.

M. S. C. Thomas, G. Westermann, D. Mareschal M. H. Johnson, S. Siros and M. W. Spratling (2008) Studying development in the 21st century [response to commentaries]. Behavioral and Brain Sciences, 31(3):345-56.

Abstract

In this response, we consider four main issues arising from the commentaries to the target article. These include further details of the theory of interactive specialization, the relationship between neuroconstructivism and selectionism, the implications of neuroconstructivism for the notion of representation, and the role of genetics in theories of development. We conclude by stressing the importance of multidisciplinary approaches in the future study of cognitive development and by identifying the directions in which neuroconstructivism can expand in the Twenty-first Century.

S. Siros, M. W. Spratling, M. S. C. Thomas, G. Westermann, D. Mareschal and M. H. Johnson (2008) Précis of Neuroconstructivism: how the brain constructs cognition. Behavioral and Brain Sciences, 31(3):321-31.

PDF

Abstract

Neuroconstructivism proposes a unifying framework for the study of development that brings together (1) constructivism (which views development as the progressive elaboration of increasingly complex structures), (2) cognitive neuroscience (which aims to understand the neural mechanisms underlying behaviour), and (3) computational modelling (which proposes formal and explicit specifications of information processing). The guiding principle of our approach is context dependence, within and (in contrast to Marr) between levels of organization. We propose that three mechanisms guide the emergence of representations: competition, cooperation, and chronotopy, which themselves allow for two central processes: proactivity and progressive specialization. We suggest that the main outcome of development is partial representations, distributed across distinct functional circuits. This framework is derived by examining development at the level of single neurons, brain systems, and whole organisms. We use the terms encellment, embrainment, and embodiment to describe the higher-level contextual influences that act at each of these levels of organization. To illustrate these mechanisms in operation we provide case studies in early visual perception, infant habituation, phonological development, and object representations in infancy. Three further case studies are concerned with interactions between levels of explanation: social development, atypical development and within that, the development of dyslexia. We conclude that cognitive development arises from a dynamic, contextual change in neural structures leading to partial representations across multiple brain regions and timescales.

X. Zhang and M. W. Spratling (2008) Automated learning of coordinate transformations. Proceedings of the Eighth International Conference on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems (EPIROB08).

G. Westermann, D. Mareschal, M. H. Johnson, S. Siros, M. W. Spratling and M. S. C. Thomas (2007) Neuroconstructivism. Developmental Science, 10(1):75-83.

PDF

Abstract

Neuroconstructivism is a theoretical framework focusing on the construction of representation in the developing brain. Cognitive development is explained as emerging from the experience-dependent development of neural structures supporting mental representations. Neural development occurs in the context of multiple interacting constraints acting on different levels, from the individual cell to the external environment of the developing child. Cognitive development can thus be understood as a trajectory originating from the constraints on the underlying neural structures. This perspective offers an integrated view of normal and abnormal development as well as of development and adult processing, and it stands apart from traditional cognitive approaches in taking seriously the constraints on cognition inherent by the substrate that delivers it.

L. A. Watling, M. W. Spratling, K. De Meyer and M. Johnson (2007) The role of feedback in the determination of figure and ground: a combined behavioral and modeling study. Proceedings of the 29th Meeting of the Cognitive Science Society (CogSci07).

PDF

Abstract

Object knowledge can exert on important influence on even the earliest stages of visual processing. This study demonstrates how a familiarity bias, acquired only briefly before testing, can affect the subsequent segmentation of an otherwise ambiguous figure-ground array, in favor of perceiving the familiar shape as figure. The behavioral data are then replicated using a biologically plausible neural network model that employs feedback connections to implement the demonstrated familiarity bias.

D. Mareschal, M. H. Johnson, S. Siros, M. W. Spratling, M. S. C. Thomas and G. Westermann (2007) Neuroconstructivism: How the Brain Constructs Cognition, Oxford University Press: Oxford, UK.

M. W. Spratling (2006) Learning image components for object recognition. Journal of Machine Learning Research, 7:793-815.

PDF

Abstract

In order to perform object recognition it is necessary to learn representations of the underlying components of images. Such components correspond to objects, object-parts, or features. Non-negative matrix factorisation is a generative model that has been specifically proposed for finding such meaningful representations of image data, through the use of non-negativity constraints on the factors. This article reports on an empirical investigation of the performance of non-negative matrix factorisation algorithms. It is found that such algorithms need to impose additional constraints on the sparseness of the factors in order to successfully deal with occlusion. However, these constraints can themselves result in these algorithms failing to identify image components under certain conditions. In contrast, a recognition model (a competitive learning neural network algorithm) reliably and accurately learns representations of elementary image features without such constraints.

M. W. Spratling and M. H. Johnson (2006) A feedback model of perceptual learning and categorisation. Visual Cognition, 13(2):129-65.

PDF

Abstract

Top-down, feedback, influences are known to have significant effects on visual information processing. Such influences are also likely to affect perceptual learning. This article employs a computational model of the cortical region interactions underlying visual perception to investigate possible influences of top-down information on learning. The results suggest that feedback could bias the way in which perceptual stimuli are categorised and could also facilitate the learning of sub-ordinate level representations suitable for object identification and perceptual expertise.

M. W. Spratling (2005) Learning viewpoint invariant perceptual representations from cluttered images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5):753-61.

PDF

Abstract

In order to perform object recognition, it is necessary to form perceptual representations that are sufficiently specific to distinguish between objects, but that are also sufficiently flexible to generalise across changes in location, rotation and scale. A standard method for learning perceptual representations that are invariant to viewpoint is to form temporal associations across image sequences showing object transformations. However, this method requires that individual stimuli are presented in isolation and is therefore unlikely to succeed in real-world applications where multiple objects can co-occur in the visual input. This article proposes a simple modification to the learning method, that can overcome this limitation, and results in more robust learning of invariant representations.

M. W. Spratling (2004) Local versus distributed: a poor taxonomy of neural coding strategies [commentary]. Behavioral and Brain Sciences, 27(5):700-2.

PDF

M. W. Spratling and M. H. Johnson (2004) Neural coding strategies and mechanisms of competition. Cognitive Systems Research, 5(2):93-117.

PDF

Abstract

A long running debate has concerned the question of whether neural representations are encoded using a distributed or a local coding scheme. In both schemes individual neurons respond to certain specific patterns of pre-synaptic activity. Hence, rather than being dichotomous, both coding schemes are based on the same representational mechanism. We argue that a population of neurons needs to be capable of learning both local and distributed representations, as appropriate to the task, and should be capable of generating both local and distributed codes in response to different stimuli. Many neural network algorithms, which are often employed as models of cognitive processes, fail to meet all these requirements. In contrast, we present a neural network architecture which enables a single algorithm to efficiently learn, and respond using, both types of coding scheme.

M. W. Spratling and M. H. Johnson (2004) A feedback model of visual attention. Journal of Cognitive Neuroscience, 16(2):219-37.

PDF

Abstract

Feedback connections are a prominent feature of cortical anatomy and are likely to have a significant functional role in neural information processing. We present a neural network model of cortical feedback that successfully simulates neurophysiological data associated with attention. In this domain our model can be considered a more detailed, and biologically plausible, implementation of the biased competition model of attention. However, our model is more general as it can also explain a variety of other top-down processes in vision, such as figure/ground segmentation and contextual cueing. This model thus suggests that a common mechanism, involving cortical feedback pathways, is responsible for a range of phenomena and provides a unified account of currently disparate areas of research.

M. W. Spratling and M. H. Johnson (2003) Exploring the functional significance of dendritic inhibition in cortical pyramidal cells. Neurocomputing, 52-54:389-95.

PDF

Abstract

Inhibitory synapses contacting the soma and axon initial segment are commonly presumed to participate in shaping the response properties of cortical pyramidal cells. Such an inhibitory mechanism has been explored in numerous computational models. However, the majority of inhibitory synapses target the dendrites of pyramidal cells, and recent physiological data suggests that this dendritic inhibition affects tuning properties. We describe a model that can be used to investigate the role of dendritic inhibition in the competition between neurons. With this model we demonstrate that dendritic inhibition significantly enhances the computational and representational properties of neural networks.

M. W. Spratling and M. H. Johnson (2002) Exploring the functional significance of dendritic inhibition in cortical pyramidal cells. Proceedings of the 11th Computational Neuroscience Meeting (CNS02). (Reprinted in the journal Neurocomputing, 2003; see above)

M. W. Spratling (2002) Cortical region interactions and the functional role of apical dendrites. Behavioral and Cognitive Neuroscience Reviews, 1(3):219-28.

PDF

Abstract

The basal and distal apical dendrites of pyramidal cells occupy distinct cortical layers and are targeted by axons originating in different cortical regions. Hence, apical and basal dendrites receive information from distinct sources. Physiological evidence suggests that this anatomically observed segregation of input sources may have functional significance. This possibility has been explored in various connectionist models that employ neurons with functionally distinct apical and basal compartments. A neuron in which separate sets of inputs can be integrated independently has the potential to operate in a variety of ways which are not possible for the conventional model of a neuron in which all inputs are treated equally. This article thus considers how functionally distinct apical and basal dendrites can contribute to the information processing capacities of single neurons and, in particular, how information from different cortical regions could have disparate affects on neural activity and learning.

M. W. Spratling and M. H. Johnson (2002) Pre-integration lateral inhibition enhances unsupervised learning. Neural Computation, 14(9):2157-79.

PDF

Abstract

A large and influential class of neural network architectures use post-integration lateral inhibition as a mechanism for competition. We argue that these algorithms are computationally deficient in that they fail to generate, or learn, appropriate perceptual representations under certain circumstances. An alternative neural network architecture is presented in which nodes compete for the right to receive inputs rather than for the right to generate outputs. This form of competition, implemented through pre-integration lateral inhibition, does provide appropriate coding properties and can be used to efficiently learn such representations. Furthermore, this architecture is consistent with both neuro-anatomical and neuro-physiological data. We thus argue that pre-integration lateral inhibition has computational advantages over conventional neural network architectures while remaining equally biologically plausible.

M. W. Spratling and M. H. Johnson (2001) Dendritic inhibition enhances neural coding properties. Cerebral Cortex, 11(12):1144-9.

PDF

Abstract

The presence of a large number of inhibitory contacts at the soma and axon initial segment of cortical pyramidal cells has inspired a large and influential class of neural network model which use post-integration lateral inhibition as a mechanism for competition between nodes. However, inhibitory synapses also target the dendrites of pyramidal cells. The role of this dendritic inhibition in competition between neurons has not previously been addressed. We demonstrate, using a simple computational model, that such pre-integration lateral inhibition provides networks of neurons with useful representational and computational properties which are not provided by post-integration inhibition.

S. J. Grice, M. W. Spratling, A. Karmiloff-Smith, H. Halit, G. Csibra, M. de Haan and M. H. Johnson (2001) Disordered visual processing and oscillatory brain activity in autism and Williams Syndrome. NeuroReport, 12(12):2697-700.

PDF

Abstract

Two developmental disorders, autism and Williams Syndrome, are both commonly described as having difficulties in integrating perceptual features, i.e., binding spatially separate elements into a whole. It is already known that healthy adults and infants display electroencephalographic (EEG) gamma band bursts (around 40Hz) when the brain is required to achieve such binding . Here we explore gamma band EEG in autism and Williams Syndrome and demonstrate differential abnormalities in the two phenotypes. We show that despite putative processing similarities at the cognitive level, binding in Williams Syndrome and autism can be dissociated at the neurophysiological level by different abnormalities in underlying brain oscillatory activity. Our study is the first to identify that binding related gamma EEG can be disordered in humans.

M. W. Spratling and M. H. Johnson (2001) Activity-dependent processes in regional cortical specialization [commentary]. Developmental Science, 4(2):153-4.

HTML

G. Csibra, G. Davis, M. W. Spratling and M. H. Johnson (2000) Gamma oscillations and object processing in the infant brain. Science, 290(5496):1582-5.

PDF

Abstract

An enduring controversy in neuroscience concerns how the brain binds together separately coded stimulus features to form unitary representations of objects. Recent evidence has indicated a close link between this binding process and 40Hz (gamma-band) oscillations generated by localized neural circuits (1). In a separate line of research, the ability of young infants to perceive objects as unitary and bounded has become a central focus for debates about the mechanisms of perceptual development (2). However, to date these infant studies have been behavioural, and there have been few, if any, paradigms involving direct measures of neural function. Here we demonstrate for the first time that binding-related 40Hz oscillations are evident in the infant brain around 8 months of age, the same age as some behavioral studies indicate the onset of perceptual binding of spatially separated static visual features. The discovery of binding-related gamma in infants opens up a new vista for experiments on postnatal functional brain development in infants.

M. W. Spratling and G. M. Hayes (2000) Learning synaptic clusters for non-linear dendritic processing. Neural Processing Letters, 11(1):17-27.

PDF

Gzipped Postscript

Abstract

Nonlinear dendritic processing appears to be a feature of biological neurons and would also be of use in many applications of artificial neural networks. This paper presents a model of an initially standard linear unit which uses unsupervised learning to find clusters of inputs within which inactivity at one synapse can occlude the activity at the other synapses.

M. W. Spratling (1999) Artificial Ontogenesis: A Connectionist Model of Development. PhD Thesis, University of Edinburgh.

PDF

Abstract

This thesis suggests that ontogenetic adaptive processes are important for generating intelligent behaviour. It is thus proposed that such processes, as they occur in nature, need to be modelled and that such a model could be used for generating artificial intelligence, and specifically robotic intelligence. Hence, this thesis focuses on how mechanisms of intelligence are specified.

A major problem in robotics is the need to predefine the behaviour to be followed by the robot. This makes design intractable for all but the simplest tasks and results in controllers that are specific to that particular task and are brittle when faced with unforeseen circumstances. These problems can be resolved by providing the robot with the ability to adapt the rules it follows and to autonomously create new rules for controlling behaviour. This solution thus depends on the predefinition of how rules to control behaviour are to be learnt rather than the predefinition of rules for behaviour themselves.

Learning new rules for behaviour occurs during the developmental process in biology. Changes in the structure of the cerebral cortex underly behavioural and cognitive development throughout infancy and beyond. The uniformity of the neocortex suggests that there is significant computational uniformity across the cortex resulting from uniform mechanisms of development, and holds out the possibility of a general model of development. Development is an interactive process between genetic predefinition and environmental influences. This interactive process is constructive: qualitatively new behaviours are learnt by using simple abilities as a basis for learning more complex ones. The progressive increase in competence, provided by development, may be essential to make tractable the process of acquiring higher-level abilities.

While simple behaviours can be triggered by direct sensory cues, more complex behaviours require the use of more abstract representations. There is thus a need to find representations at the correct level of abstraction appropriate to controlling each ability. In addition, finding the correct level of abstraction makes tractable the task of associating sensory representations with motor actions. Hence, finding appropriate representations is important both for learning behaviours and for controlling behaviours. Representations can be found by recording regularities in the world or by discovering re-occurring patterns through repeated sensory-motor interactions. By recording regularities within the representations thus formed, more abstract representations can be found. Simple, non-abstract, representations thus provide the basis for learning more complex, abstract, representations.

A modular neural network architecture is presented as a basis for a model of development. The pattern of activity of the neurons in an individual network constitutes a representation of the input to that network. This representation is formed through a novel, unsupervised, learning algorithm which adjusts the synaptic weights to improve the representation of the input data. Representations are formed by neurons learning to respond to correlated sets of inputs. Neurons thus became feature detectors or pattern recognisers. Because the nodes respond to patterns of inputs they encode more abstract features of the input than are explicitly encoded in the input data itself. In this way simple representations provide the basis for learning more complex representations. The algorithm allows both more abstract representations to be formed by associating correlated, coincident, features together, and invariant representations to be formed by associating correlated, sequential, features together.

The algorithm robustly learns accurate and stable representations, in a format most appropriate to the structure of the input data received: it can represent both single and multiple input features in both the discrete and continuous domains, using either topologically or non-topologically organised nodes. The output of one neural network is used to provide inputs for other networks. The robustness of the algorithm enables each neural network to be implemented using an identical algorithm. This allows a modular `assembly' of neural networks to be used for learning more complex abilities: the output activations of a network can be used as the input to other networks which can then find representations of more abstract information within the same input data; and, by defining the output activations of neurons in certain networks to have behavioural consequences it is possible to learn sensory-motor associations, to enable sensory representations to be used to control behaviour.

M. W. Spratling (1999) Pre-synaptic lateral inhibition provides a better architecture for self-organising neural networks. Network: Computation in Neural Systems, 10(4):285-301.

PDF

Gzipped Postscript

Abstract

Unsupervised learning is an important property of the brain and of many artificial neural networks. A large variety of unsupervised learning algorithms have been proposed. This paper takes a different approach in considering the architecture of the neural network rather than the learning algorithm. It is shown that a self-organising neural network architecture using pre-synaptic lateral inhibition enables a single learning algorithm to find distributed, local, and topological representations as appropriate to the structure of the input data received. It is argued that such an architecture not only has computational advantages but is a better model of cortical self-organisation.

M. W. Spratling and G. M. Hayes (1998) Learning sensory-motor cortical mappings without training. Proceedings of the 6th European Symposium on Artificial Neural Networks (ESANN). M. Verleysen (ed.) pp. 339-44. D-facto Publications.

Gzipped Postscript

Abstract

This paper shows how the relationship between two arrays of artificial neurons, representing different cortical regions, can be learned. The algorithm enables each neural network to self-organise into a topological map of the domain it represents at the same time as the relationship between these maps is found. Unlike previous methods learning is achieved without a separate training phase; the algorithm which learns the mapping is also that which performs the mapping.

M. W. Spratling and G. M. Hayes (1998) A self-organising neural network for modelling cortical development. Proceedings of the 6th European Symposium on Artificial Neural Networks (ESANN). M. Verleysen (ed.) pp. 333-8. D-facto Publications.

Gzipped Postscript

Abstract

This paper presents a novel self-organising neural network. It has been developed for use as a simplified model of cortical development. Unlike many other models of topological map formation all synaptic weights start at zero strength (so that synaptogenesis might be modelled). In addition, the algorithm works with the same format of encoding for both inputs to and outputs from the network (so that the transfer and recoding of information between cortical regions might be modelled).

M. W. Spratling (1997) Artificial Ontogenesis: Cognitive and Behavioural Development for Robots. Unpublished Departmental Discussion Paper, Department of Artificial Intelligence, University of Edinburgh.

Abstract

There are three classes of adaptive process (structural definition, structural adjustment, and parameter adjustment) which appear to underly the development of intelligence in nature. In artificial intelligence only two of these processes are used; AI ignores development (structural adjustment). While AI attempts to predefine explicit rules for behaviour, nature's success in building complex creatures depends on predefining how rules to control behaviour can be learned. It is the developmental processes in biology through which such rules are learned. This proposal is to apply mechanisms similar to those used in biological development to robots. This will move robotics from `development' meaning design and production, towards `development' in its biological sense meaning a process of growth and progressive change. Defining the rules for development is design at a meta-level to that currently used. It is proposed that the long process of evolution used by nature to define these developmental processes might be supplanted by another adaptive process, that of engineering, to more quickly enable study of ontogenetic development.

This project thus aims to apply techniques inspired by animal development to engineering robot control systems. Specifically it is proposed that a hierarchical control system, based on the cerebral cortex, is used and that this develops through constructivist learning algorithms (ones in which the interaction of a situated agent with its environment guides the creation of cognitive machinery appropriate for representing and acting in that environment). Such a robot would be provided with some innate, low-level, behavioural abilities and through experience develop more complex behaviour.

M. W. Spratling and R. Cipolla (1996) Uncalibrated visual servoing. Proceedings of the 7th British Machine Vision Conference (BMVC). R. B. Fisher and E. Trucco (eds.) pp. 545-54. BMVA.

Gzipped Postscript

Abstract

Visual servoing is a process to enable a robot to position a camera with respect to known landmarks using the visual data obtained by the camera itself to guide camera motion. A solution is described which requires very little a priori information freeing it from being specific to a particular configuration of robot and camera. The solution is based on closed loop control together with deliberate perturbations of the trajectory to provide calibration movements for refining that trajectory. Results from experiments in simulation and on a physical robot arm (camera-in-hand configuration) are presented.

M. W. Spratling (1994) Learning the Mapping Between Sensor and Motor Spaces to Produce Hand-Eye Coordination. MSc Dissertation, Department of Artificial Intelligence, University of Edinburgh.

Abstract

Coordination between sensory inputs and motor actions is essential for intelligent robotics. This dissertation considers the control of a simple manipulator using sensory information to locate the target position for the end-effector. The control mechanisms investigated all form topographic maps of possible configurations of the manipulator joints (the motor space) and the values of the sensor inputs (the sensor space). Various methods are considered for learning to relate a location on the sensor space map (which represents the target position in the world) and the location in the motor space map which will configure the manipulator to reach this target position. These methods are analysed using a computer simulation and a suitable algorithm to solve the hand-eye coordination problem is presented.

M. M. Ross, M. W. Spratling, C. B. Kirkland and P. S. Story (1994) Measurement of microfog wetness in a model steam turbine using a miniature optical spectral extinction probe. IMechE International Symposium on Optical Methods and Data Processing In Heat and Fluid Flow.

Neural Networks deep learning; biologically-inspired neural networks; adversarial robustness; representation learning; dynamic and deformable neural networks		Computational Neuroscience predictive coding; neural information processing and coding; dendritic computation; cortical feedback connections; cortical region interactions
Computer Vision object recognition; viewpoint invariance; image segmentation; object detection; tracking; anomaly detection; generalisation; out-of-distribution rejection; open-set recognition		Visual Cognition visual attention; biased competition; top-down and contextual influences; visual salience; perceptual inference
Machine Learning sparse coding; continuous learning; few-shot learning; unsupervised and self-supervised learning; competitive learning; classification; clustering; contrastive and deep-metric learning		Learning and Development cortical, behavioural and cognitive development; perceptual learning; perceptual and conceptual development; epigenetic and developmental robotics

Research Interests

Neural Networks

Computational Neuroscience

Computer Vision

Visual Cognition

Machine Learning

Learning and Development

Publications Google ScholarSemantic ScholarScopusResearcher IDORCID

Publications
Google ScholarSemantic ScholarScopusResearcher IDORCID