XOTAR’s Robot Vision Systems are developed using a complex software and hardware architecture that emulates high level human vision processing such as geometric reasoning, low level and early vision processes such as texture discrimination and object segmentation, and cognitive processes such as memory, planning and reasoning. The architecture is organized according to known cortical organization of cognitive, perceptual and motor processes in humans and higher primates. It consists of a cognitive executive implementing executive control functions such as, attention, goal determination, long term memory management, effector and locomotion motion planning and spatial-temporal reasoning. This higher level processing technology is called a Perception and Control Law Processor (PCLP), the main cognitive processing system of XOTAR’s technology. A variety of lower level signal, image and motor processing modules, often time hierarchically organized, interact with the PCLP to complete the perception-action cycle.
This software architecture was built on a massively parallel CRAY CX-1 personal supercomputer with over 2100 cores, and utilizes state of the art massively parallel algorithms that are dynamically and contextually configured for image processing, geometric reasoning and transformations, language processing and high level reasoning functions. XOTAR’s initial systems will use a specially designed compression and communications technology to stream between the embedded image sensors and controls of the robot and the remote PCLP via a wireless protocol. Once the core algorithms and architecture are refined, ASIC’s will be developed and functionalities moved to the embedded electronics of the camera’s and robots. This process will occur in numerous steps over the following 10 years, leading to high volume lower cost systems and eventually consumer humanoid robots. There are many systems that will never need the PCLP functions to reside in the embedded processing of the robot itself, these applications are either large, like defense systems, fixed, like security cameras, or have a limited mobility, which includes, manufacturing robots. According to the above, XOTAR’s technology consists of a highly sophisticated massively parallel cognitive processing architecture with flexible implementations in software and hardware. The following describes several technology subsystems that the cognitive systems architecture consists of:
High Dynamic Range Vision – The Optical and Image Sensor and front end image processing functions of XOTAR’s Robot Vision technology acquires high dynamic range images which are required for successful image processing functions, especially when shadows or complex scene lighting presents problems for image processing functions. Image data communicated to the remote PCLP is compressed using the new JPEG XR image coding system designed for high dynamic range images.
Stereo Robot Eyes – XOTAR’s robot vision system is a complex, active vision technology designed after the human visual system with some additional features. First, it is a binocular design that enables depth of field acquisition in observing a scene. It has the feature of vergence and accommodation eye movements and controls for vergence- accommodation interactions. It has complex oculomotor controls allowing saccadic eye movements for rapidly refocusing and interrogating a scene. Additionally, XOTAR’s eyes will have a fairly powerful zoom capability, not found in biological eyes. This feature may only be used in the smart camera technologies due to the processing complexity it introduces for mobile robots.
Early Vision Processing – XOTAR is developing a highly parallel early vision processing technology that will be the company’s first technology to be converted to dedicated silicon. This subsystem is organized according to know functions of early vision processing in humans and higher primates, with separate parallel modules for features like texture recognition and object boundary detection and other processes for integrating the various low level features into higher level patterns. Highly proprietary learning experiments will be utilized to configure the low level recognition behaviors of this system, emulating the first several years of human early vision development. XOTAR’s early vision system makes use of dynamically configured Bayesian image processing algorithms and computational group theory. The system has both passive ( process scene data regardless of context ) and active processes (context specific ).
High Level Vision Processing – A fundamental task of XOTAR’s vision system is to actively probe a scene, similar to the concept of feeling or touching it, to interactively acquire a three dimensional model of the environment and update it according to dynamic events in the scene as well as changes caused by the robots actions. This fundamental system is called the Geometric Reasoning Module in XOTAR’s architecture. Effectively, this module transforms a scene into an internal representation that looks like a computer game to the robot. It is this mental imagery that the robot operates off of, the image and signal processing functions only serve to acquire and update this model. Statistical Geometric methods are used to infer and complete geometric features, including shapes. The robot uses this model for real-time operations, mental imagery, for planning near futures, and long term memory using a proprietary scene compression and storage design.
Perception and Control Processor - XOTAR’s Cognitive Systems architecture is fundamentally designed to encode the concept of action and the underlying concept of a desirable state of affairs or imperative future. The processing technology is an iterative perception-action cycle driven by this executive. The perceptual processes are a foundation for determining both the current state of the environment and the achievement of a desirable state of affairs by the robots interactions with the environment. The model for doing this in the science of Artificial Intelligence is the Belief, Desire, Intention model. BDI models are a long term focus of the AI community. XOTAR’s implementation is one of the first to be built around a perceptual foundation and developed using probabilistic modal logics ( more on this in the future ). The PCLP consists of an action and perception calculus that is interpreted by executive functions to bring about intelligent behaviors. This system also encodes contextual constraints that control both learning and control behaviors according to human defined rules (Think of Isaac Asimov’s Three – laws of robotics ). Initially, the man-machine interface of the PCLP will use a special low level encoding that is designed to eventually support natural language and speech dialogue interfaces. The high level reasoning executive is perception based; all actions, beliefs and desire are encoded according to perceptual states that either the robot or vision system must observe or bring about in its environment.
Further detail on XOTAR’s technology will be available on the site shortly. There are several white papers and conference papers being written that elaborate on various aspects of the technology.