#### Design of large area, pixelated ASICs for picosecond timing applications

E. Charbon *EPFL* 

edoardo.charbon@epfl.ch

# Outline

- Why large area TOA ASICs?
- TDC Basics & Architectures
- Case Studies
- ASIC vs. FPGA
- 3D Integration
- Quantum computing
- Conclusions

### Why large area TOA ASICs?

# Large Area TOA Applications

- Optical rangefinder on-pixel (3D camera)
- Fluorescence lifetime imaging microscopy (FLIM)
- Fluorescence Correlation Spectroscopy (FCS)
- Detection of a scintillation shower upon gamma photon detection in PET
- High energy physics (HEP)

# HEP

- Extremely harsh conditions
  - Large ionizing doses, Gamma
  - Protons, Neutrons
- Very demanding specs
  - TOA resolutions in ns to ps
  - Ranges of µs to ms
- Very low dead times
  - Events spaced ns
  - Gevents/s
- Large number of points-of-measurement
  - Thousands to million points
  - Large surfaces

# Example (Courtesy: Artur Apresyan)

#### Precision timing for CMS in HL-LHC

- CMS Phase 2 upgrade aims to achieve high precision timing measurements
  - In ECAL barrel: new electronics to achieve ~30 ps resolution for 30 GeV photons
  - In HGCal: design to achieve 50 ps iming resolution per layer in EM showers, multiple layers can be combined
  - Additional potential capabilities: MIP timing to cover large fraction of charged particles in the event
    - A thin LYSO + SiPM layer in the barrel, LGAD layer in the endcap. ~30 pec MIP timing up to lηl<3.0</li>

Fermilab

6



### Another Example

- In time-of-flight PET one needs
  - A large number of point-of-measurement
  - A high timing resolution
- Synchronization is extremely important to enable coincidence computation and rejection of singles



#### **TDC Basics**

### **TDC** Objective



But, in most cases:



# **TDC Symbol**



### **Basic Definitions**

- Bin size or LSB  $\tau$  (sec)
  - Minimum distance between time events that can be resolved
- Accuracy & precision (sec)
  - Time-invariant offset
  - Time-varying drift
- Range (sec)
  - Maximum time difference that can be measured
- Conversion rate (MS/sec)
- Latency (sec)
- Non-linearities
  - Differential non-linearity (DNL)
  - Integral non-linearity (INL)
- Single-shot accuracy (sec)

## **Input Non-Idealities**

- Signals are non-Dirac
  - Non-zero rise time
  - Non-zero width
- START-STOP sequence is not regular
- Signals have jitter in
  - Time
  - Amplitude
- Temperature
- Supply variations

#### **TDC Non-Idealities**



# DNL, INL

- Integral non-ideality (INL) is the integral of DNL
- Depending upon definition, starts and ends at 0



### How to Measure: Density Test

- Poisson distributed uniform START generator
- Measure statistics of TDC measurements per bin
- Normalize to average counts, differences are DNL points



# Single-Shot Accuracy (SSA)

- Repeat measurement of single time-of-arrival and construct histogram
- Derive statistics by Gaussian fitting and calculation of FWHM or  $\sigma$  or  $3\sigma.$



## **Optical Tests**

- Density test: free running SPAD
- Single-shot experiment:
  - Histogram  $\Delta t_i$ , *i*=[1...*N*]

(time-correlated single-photon counting – TCSPC)



### Figures of Merit

- Power, LSB, DNL/INL, SSA, area
- Temperature stability
- Cross-talk



#### Architectures

### The Simplest: A Counter

- Resolution:  $\tau = 1/f_{clock}$
- Conversion rate = 1/latency



### Counter – Register

- Advantage: fast counter can be shared among many HIT lines
- Fast registers easier to build



### **Delay Chain**



# **Delay Chain**

- Resolution:  $\tau$  = delay element
- Conversion rate = 1/latency
- Latency =  $N \times \tau$
- Need a thermometer decoder:  $N \rightarrow \log_2(N)$
- Issues: metastability, bubbles



### **Phase Interpolator**



24

### Phase Interpolator

- Resolution:  $\tau$  = delay element
- Conversion rate = 1/latency
- Latency =  $N \times \tau$
- Need a thermometer decoder:  $N \rightarrow \log_2(N)$
- <u>Issues</u>: metastability, no bubbles

### Vernier Lines

- Resolution:  $\tau = \tau_{slow} \tau_{fast}$
- Conversion rate = 1/latency
- Latency =  $N \times \tau_{slow}$
- Need a thermometer decoder:  $N \rightarrow \log_2(N)$
- <u>Issues</u>: metastability, matching



# **Pulse Shrinking**

- Resolution:  $\tau = \tau_{rise} \tau_{fall}$
- Conversion rate = 1/latency
- Latency =  $N \times \tau_{slow}$
- Need a thermometer decoder:  $N \rightarrow \log_2(N)$
- <u>Issues</u>: matching



# **Ring Oscillators**

- Resolution:  $\tau$  = delay element
- Conversion rate = 1/latency
- Latency =  $N \times \tau$
- Need a thermometer decoder:  $N \rightarrow \log_2(N)$
- <u>Issues</u>: metastability, matching, asymmetric load



### **Actual Implementation**

- Fully differential
- Partial propagation readout
  - lower oscillation frequency or higher resolution
  - Rise times and fall times doubles resolution
- Invariant load to improve linearity



## **Delay Element Implementation**

- Uniform rise/fall time
- Bias control used for feedback
- Positive feedback for speed



 $V_{DD}$ 

 $V_{DD}$ 

# Asymmetric Rise/Fall Time

- E.g. inverter starved cell
- Rise time = $V_{DD}$   $C_{load}/I$
- Fall time: inverter delay



# Semi-Digital TDCs

 Determine time difference based on propagation through an RC line



# Time Difference Amplifier (TDA)

- Time differences are multiplied as in successive approximation ADCs
- Issues: gain stability, jitter



### **TDA Base Cell**



**Bias Circuit** 

### **TDA Base Cell**



**Bias Circuit** 

**Fast Behavior** 

### **TDA Base Cell**



# TDA in a TDC



## TDA in a TDC



# Other Composite TDCs

- Counter + Phase Interpolator + Vernier Niclass *et al.*, JSSC08
- Ring Oscillators + Counters
  Veerappan *et al.*, ISSCC11
- Ring Oscillators + TDA
  Mandai and Charbon , ESSCIRC11

... and many more

# **Stabilization Techniques**

 Process, Voltage supply, Temperature (PVT) variations eliminated using a delay locked-loop (DLL) in clock phase generation



# PVT Stabilization in Phase Interpolators

- DLL running in parallel as a replica of delay chain
- Distribute bias to all delay chains



# **Nested Stabilization Loops**



# Metastability in Ring Oscillators



# Case 1: Monolithic Fully Parallel TDC

# An Array of 20,480 TDCs

- Massive array of pixels comprising
  - single-photon avalanche diode (SPAD)
  - TDC (ring oscillator type)
  - Memory
- Readout
  - Frame rate: 1us
  - Fully digital

#### **TDC** Implementation

Analog techniques allow greater architecture flexibility



#### Single-gate delay means less power, faster transitions

C. Veerappan, J. Richardson, R. Walker, D.-U. Li, M. W. Fishburn, Y. Maruyama,

D. Stoppa, F. Borghetti, M. Gersbach, R.K. Henderson, E. Charbon, ISSCC2011

#### The MEGAFRAME Pixel



47

# The MEGAFRAME Chip

- Format: 160x128 pixels
- Timing resolution: 55ps
- Impulse resp. fun.: 140ps
- DCR (median): 50Hz
- R/O speed: 250kfps
- Size: 11.0 x 12.3 mm<sup>2</sup>





TDC Ring oscillator (3 bits) + counter (7 bits) = 10 bits 48

# The Megaframe-128 Chip



C. Veerappan, J. Richardson, R. Walker, D.-U. Li, M. W. Fishburn, Y. Maruyama, D. Stoppa, F. Borghetti, M. Gersbach, R.K. Henderson, E. Charbon, *ISSCC2011* 

# Imager Block Diagram



#### **Pixel Architecture**



# **Photon Counting**



#### Photon Time-of-Arrival



#### **TDC** Characterization



55ps resolution, 55ns range

#### System-level Timing



# **INL** Uniformity



Row

#### **Optical Burst Detection**



# IR Drop in MEGAFRAME

- If a large number of TDCs are operating at once, then IR drop occurs
- As a result the LSB of TDCs changes in space



# Pitfalls of MEGAFRAME

- LSB changes as a function of position of the pixel
- There is a dependency to brightness that will change the current absorbed
- If a VCO is disrupted, the disruption will propagate through the array in unpredictable ways



#### You Can Compensate, but...

e.g. A replica of the pixel VCO can be placed in a PLL but mismatch will dominate the error



#### Case 2: Column-Parallel TDC

#### Column-parallel TDC Idea



S. Mandai and E. Charbon, IEEE Nuc. Sci Symp. (NSS) 2012

# Column-parallel TDC Idea

- A single VCO distributing the oscillation to all TDCs in a line
- Pros
  - Picosecond skew among TDCs
  - No LSB variability
  - Good PVT control
- Cons
  - Power & buffers create skews



# **Column-parallel TDC Solutions**





# **Column-parallel TDC Solutions**



65

432 TDC array

# Column-parallel TDC Uniformity



# **Column-parallel TDC Uniformity**



# Column-parallel TDC Uniformity



# Column-parallel TDC with Memory



#### ASIC vs. FPGA

# FPGA vs. discrete ASIC

- An application-specific integrated circuit (ASIC) is a chip with static circuitry optimized for one task
- A field-programmable gate array (FPGA) is a chip whose configuration, specified by a hardware description language, can be changed many times

# **General Comparison**

#### **FPGA**

- Fast Development Time
- Reconfigurable
  - Lower fault risk
  - Iterate design
- Low non-recurring costs
  - Development
  - Testing

#### ASIC

- Lower power
- Faster operation
- Smaller footprint
- Better integration
- More flexibility
- Low unit costs
  - High-volume applications

## How to Build a Delay Chain



## FPGA Caveats: Clock Regions



## **Example FPGA Architecture**

Only digital techniques available with existing cells



### Virtex-6 FPGA TDC

| Implementation       | 1 #1 ( | design | on Vi   | rtex-6)   |
|----------------------|--------|--------|---------|-----------|
|                      | Min    | Тур    | Max     | Unit      |
| Clock frequency      |        | 200    |         | MHz       |
| Standard uncertainty | 7.38   |        | 14.24   | ps        |
| Resolution           |        | 9.8    |         | ps        |
| DNL                  | -1     |        | 6.2     | LSB       |
| INL                  | -2.1   |        | 13.7    | LSB       |
| Throughput           |        | 100    |         | MSample/s |
| Implementatio        | on #2  | (impr  | oved ti | ming)     |
| Clock frequency      |        | 600    |         | MHz       |
| Standard uncertainty | 7.38   |        | 14.24   | ps        |
| Resolution           |        | 9.8    |         | ps        |
| DNL                  | -1     | 1      | 1.5     | LSB       |
| INL                  | -2.8   |        | 4.1     | LSB       |
| Throughput           |        | 300    |         | MSample/s |
| Implementation       | n #3 ( | impro  | ved po  | sition)   |
| Clock frequency      |        | 600    |         | MHz       |
| Standard uncertainty | 7.38   |        | 14.24   | ps        |
| Resolution           |        | 9.8    |         | ps        |
| DNL                  | -1     |        | 1.5     | LSB       |
| INL                  | -2.25  | 2      | 1.61    | LSB       |
| Throughput           |        | 300    | -       | MSample/s |

77

### **Temperature Dependence**



| Color | Temp.         | Res.(ps) | $\mu(V)$ | $\sigma(mV)$ |
|-------|---------------|----------|----------|--------------|
|       | $10^{\circ}C$ | 9.8      | 1.0096   | 2.9          |
|       | $40^{\circ}C$ | 10.22    | 1.0034   | 1.9          |
|       | $60^{\circ}C$ | 10.48    | 0.9993   | 3.2          |

### Location, Location, Location



#### **Chip-to-chip Variation**



# **TDC** Comparison

#### **FPGA**

- Best time uncertainty: 20ps
- Usage examples
  - High-energy physics
  - OpenPET

#### ASIC

- Best time uncertainty: <1ps</li>
- Examples
  - Time-correlated imaging
  - Frequency synthesizers for RF

## FPGA- or ASIC-based TDC?

- Consider an FPGA-based TDC if your application:
  - Is low-volume
  - Doesn't require <20ps time uncertainty</li>
  - Is sensitive to development time, or is being created in iterations
  - Is open source (FPGA-based TDCs are code-based)

### **3D Integration**

# 3D ICs – Hybrid Bonding



# 3D ICs – Hybrid Bonding

- Sony Corp. (?/?)
- STMicroelectronics (65/45nm)
- TSMC (45/65nm)
- Tezzaron (anything/anything)

## TSMC BSI + 3D-Stacking



- Tier 1: SPADs + microlenses
- Tier 2: quenching, recharge, TDCs, multi-core, memories, communication unit, I/O

# **TDC** Sharing



- Virtually zero skew
- Preservation of origin of pulse

## **TDC** Layer



A.R. Ximenes, P. Padmanabhan, M.J. Lee, Y. Yamashita, D.N. Yaung, E. Charbon, ISSCC 2018

## **3D-Stacked Chip Micrograph**



#### LiDAR Demonstrator



#### **Distance Measurements**



### **Interference Suppression**



## 256x256 3D Image Reconstruction



## Large TDC Arrays

Instead of a large VCO distributing the sync to a large array of TDCs... build a large array of overall constant FOM Overall constant FOM Pros

- Individual FOM improved by 10 log (M)
- Synchronization is ~1ps
- PVT robust
- Robust to local disruptions



## **Mutual Coupling**

- Use injection locking for coupling VCOs
- The PLL only forces the desired frequency on the VCOs



© 2018 Edoardo Charbon

### Mutual Coupling



## **Mutual Coupling Measurements**



## **Mutual Coupling Measurements**



## Perspectives for 2020

- Sub-65nm CMOS
- Large, scalable designs (Lego<sup>™</sup> approach)
- Backside illumination (BSI) 3D IC
- Hybrid approaches (InP, GaAs, Ge, polymers)
- Cryogenic operation

### Moore's Law Will Help



# **Quantum Computing**

#### The 2012 Nobel Prize





2012 Physics Nobel Prize

Serge Haroche

**David Wineland** 

Both Laureates work in the field of quantum optics studying the fundamental interaction between light and matter, a field which has seen considerable progress since the mid-1980s. Their ground-breaking methods have enabled this field of research to take the very first steps towards building a new type of super fast computer based on quantum physics. Perhaps the quantum computer will change our everyday lives in this century in the same radical way as the classical computer did in the last century.

-Announcement 2012 Nobel Prize

#### From bits to qubits

- A quantum bit or qubit is a quantum system in which the Boolean states 0 and 1 are represented by a pair of mutually orthogonal quantum states labeled as  $|0\rangle$ ,  $|1\rangle$
- Quantum properties: superposition and entanglement



#### **Qbits on a Chip**



Semiconductor quantum dots



Semiconductor-superconductor hybrids



Superconducting circuits



Impurities in diamond or silicon

#### **Quantum Computer Architecture**

Quantum bits (qubits)



- Carrier frequency: 100 MHz 15 GHz, 70 GHz
- Pulses: 10 100 ns

#### **Quantum Computer Architecture**



• Pulses: 10 – 100 ns

#### **A Real-life Quantum Computer**



#### **Possible Solutions**

#### Proposed solution

- Electronics at 4 K
- Only connections to 4 K to 20 mK are needed



#### **Possible Solutions**

#### Proposed solution

- Electronics at 4 K
- Only connections to 4 K to 20 mK are needed



- Ultimate solution
  - Qubits at 4 K
  - Monolithic integration

### **Electronic Readout & Control**



E. Charbon et al., IEDM 2016

### **Cooling Power Issue**



**Dilution refrigerator** 



Courtesy: Oxford instruments

## **Scalability Issue**

- Noise budget.....< 0.1nV/VHz
- Power budget (for scalability)......
- Physical dimensions (for scalability)...... 30nm
- Bandwidth (for multiplexing)......1-12GHz
- Kick-back avoidance

### **Cryogenic Electronics**

### **Cryo-CMOS Technologies**



### **Device Modeling (40nm)**



116

## BJTs and DTMOS in mK domain

- BJTs can work as bandgap reference at T>77 K
- DTMOS can be used as bandgap reference at cryo temperatures



H. Homulle, E. Charbon, F. Sebastiano, JEDS 2018

#### **Substrate Resistivity**



## **SPICE Models, Farms**

- We created models for 4K components in Verilog-AMS, BSIM6, PSP
- We are building a complete model toolkit for 40nm and 160nm CMOS technologies
- Models are tested using *cryogenic component farms*

### **Cryogenic Circuits & Systems**

## **Cryo-FPGAs**



Harald Homulle

- Artix-7 full operation down to 4K
- Other FPGAs only limited to 30K

## **FPGA functionality**

- All FPGA components are working in the cryogenic environment down to 4K
- No modifications required

| Component | Functional   | Behavior                          |
|-----------|--------------|-----------------------------------|
| IOs       | $\checkmark$ |                                   |
| LVDS      | $\checkmark$ |                                   |
| LUTs      | ✓            | Delay change < 5%                 |
| CARRY4    | $\checkmark$ | Delay change < 2%                 |
| BRAM      | $\checkmark$ | No corruption (800 kB)            |
| MMCM      | $\checkmark$ | Jitter reduction of roughly 20%   |
| PLL       | $\checkmark$ | Jitter reduction of roughly 20%   |
| IDELAYE2  | $\checkmark$ | Delay change of up to 30%         |
| DSP48E1   | $\checkmark$ | No corruption over 400 operations |

## A/D conversion on FPGA

- Principle
  - Time stamp the cross-over of input with reference ramp
  - Use TDC for timestamping
- Bottleneck: we are bound to the CMOS technology of the FPGA



## ADC on FPGA (1.2GSa/s)



### **ADC on FPGA**



## **Cryo-SPADs**





## **Cryo-LNA**



E. Charbon et al., ISSCC 2017

- Standard 160nm CMOS
- 500 MHz Bandwidth
- 0.1dB Noise figure
- 7K noise-equivalent temperature





## **2D Readout and Control**

- Use *imaging sensor* readout as inspiration
- Reduce number of transistors (ideally to zero)
- Use tunneling barriers as selectors
- (limited) use of 3D stacking



#### **Putting Things in Context**



## Conclusions

## **Take-home Messages**

- Large arrays of TDCs for TOA are necessary to a number of emerging fields
- Modularity is an important ingredient to large TDC arrays but one needs to be aware of synchronization, reliability, and uniformity issues
- 3D-stacking / 3D integration is becoming a way of life!
- Quantum Computing will need these circuits but will require cryogenic operation

# Acknowledgements

Swiss National Science Foundation European Space Agency FP6 and FP7 NCCR-MICS NOW-STW NIH

## http://aqua.epfl.ch