We work on hardware design for computing, communications and machine intelligence. Towards the goal of designing the most efficient and high-performance hardware, we make use of all possible tools: algorithm, architecture, circuits and even substrates. Below is an overview of some of our most recent projects.

Neuro-Inspired Cognitive Computing

We are investigating neuro-inspired approaches to designing highly efficient sensory processing frontend. One promising approach is called sparse coding, which attempts to learn an overcomplete dictionary to code sensory inputs, such that the encoding is kept sparse. This neuro-inspired approach is akin to compressive sensing, and is ideally suited to coding “big data” to extract useful features. We have designed several compact IC chips to implement sparse coding in spiking recurrent neural nets. Taking advantage of sparse spikes, our designs achieved hundreds of megapixel per second, dissipating only several mW. We are building on top of the frontend to realize more sophisticated functionalities beyond feature extraction.

Acceleration of DNN Workloads on Embedded Platforms

With wide acceptance in practical applications, deep neural nets (DNN) represent a growing portion of the compute workload. Accelerating DNN computation is faced with two challenges: the extreme memory bandwidth needed to support parallel operations to meet the processing requirements, and the ever-changing network structures that make fixed accelerators quickly obsolete. We are investigating approaches to fundamentally reduce the storage and compute requirements, and a flexible compute architecture along with a mapping strategy to efficiently support various new network structures.

In-Memory Accelerator for Scientific Computing

In-memory compute is a strategy that merges compute and storage in one to reduce or eliminate costly data movement and break the “memory wall”. Scientific computing applications involve manipulations of large (and often sparse) matrices, and they can potentially benefit the most from in-memory compute. However, in-memory compute is fundamentally analog compute, and it is severely limited in resolution; whereas scientific computing applications often require floating-point and even double floating-point precision. We are investigating ways to adapt low-resolution in-memory compute as a part of an overall optimal accelerator for scientific computing applications.

System Integration Technologies: High-Speed Interfaces and Network on Chip

Modular design is an important method in constructing large-scale integrated systems. The modular approach has its downsides: compatibility issues, system clock tree, and limited interconnect bandwidth that limits performance, to name a few. We are developing the building blocks needed in the modular chip-scale integration, including networks on chip that enable flexible tiling of modules and seamless integration, and high-speed interfaces that enable 10’s of Gbps per lane for the unhindered communications between modules at pJ/b or less.

Signal Processing for Massive MIMO Wireless Communications

Massive multiple-input and multiple-output (MIMO) is viewed as a key enabling technology for 5G wireless communications. A large number of antennas offers more degrees of freedom, but the signal processing workload is increased tremendously. This work is targeting interference cancellation problem in massive MIMO, and developing new detectors that are capable of handling 5G throughputs and yet still meeting the necessary performance under practical channel conditions. We have developed several detector ICs to demonstrate the opportunities in co-design coding and detection, as well as applying suitable computational transforms to achieve optimal tradeoffs between SNR performance and detector complexity.

Advanced Channel Coding for Communication and Storage

Channel coding is a common strategy employed in reliably transmitting (or storing) more bits within a limited SNR. We are investigating advanced codes, namely polar codes, LDPC, and nonbinary LDPC codes, to reach the last fraction of a dB in SNR without breaking the receiver power budget in operating complex decoders. Our work covers Gbps decoder designs, efficient decoding algorithm designs, and code designs. We have demonstrated several Gbps polar and LDPC decoder ICs that consume on the order of 10 pJ/b. Our current direction is focused on designing polar codes and low-latency, high-throughput decoders for 5G wireless communications.