(ns org.soulspace.arrayfire.ffi.iir
  "Bindings for the ArrayFire infinite impulse response (IIR) filter functions.
   
   An IIR (Infinite Impulse Response) filter is a recursive digital filter used
   extensively in signal processing, audio processing, and control systems. Unlike
   FIR filters which only use current and past input samples, IIR filters use both
   past input samples (feedforward) and past output samples (feedback), creating
   a potentially infinite impulse response.
   
   ## Mathematical Foundation
   
   ### Difference Equation
   
   The IIR filter is defined by the linear difference equation:
      ```

   ```
   y[n] = (1/a[0]) × (b[0]×x[n] + b[1]×x[n-1] + ... + b[M]×x[n-M]
                       - a[1]×y[n-1] - a[2]×y[n-2] - ... - a[N]×y[n-N])
   ```
   
   Where:
   - y[n] = output sample at time n
   - x[n] = input sample at time n
   - b[k] = feedforward coefficients (numerator, order M)
   - a[k] = feedback coefficients (denominator, order N)
   - M = number of feedforward taps (filter order for FIR part)
   - N = number of feedback taps (filter order for IIR part)
   
   ### Transfer Function
   
   In the z-domain (discrete-time frequency domain), the transfer function is:
   
   ```
   H(z) = Y(z)/X(z) = (b[0] + b[1]z⁻¹ + ... + b[M]z⁻ᴹ) / (a[0] + a[1]z⁻¹ + ... + a[N]z⁻ᴺ)
   ```
   
   This rational function represents:
   - **Numerator (b coefficients)**: Zeros of the transfer function
   - **Denominator (a coefficients)**: Poles of the transfer function
   
   ### Normalization
   
   Typically, the coefficients are normalized so that a[0] = 1:
   
   ```
   H(z) = (b'[0] + b'[1]z⁻¹ + ... + b'[M]z⁻ᴹ) / (1 + a'[1]z⁻¹ + ... + a'[N]z⁻ᴺ)
   ```
   
   Where b'[k] = b[k]/a[0] and a'[k] = a[k]/a[0].
   
   ### Stability
   
   An IIR filter is **stable** if and only if all poles of H(z) lie strictly
   inside the unit circle in the z-plane:
   
   ```
   |z_pole| < 1  for all poles
   ```
   
   Unstable filters produce unbounded outputs and oscillations. Key considerations:
   - **Stability testing**: Check pole locations after design
   - **Quantization effects**: Finite precision can move poles outside unit circle
   - **Cascaded sections**: Break high-order filters into second-order sections (SOS)
   
   ## ArrayFire Implementation
   
   ### Algorithm Steps
   
   The ArrayFire IIR filter implementation follows these steps:
   
   1. **Feedforward (FIR) Stage**:
      ```
      c[n] = b[0]×x[n] + b[1]×x[n-1] + ... + b[M]×x[n-M]
      ```
      - Implemented via 1D convolution: c = convolve1(x, b, AF_CONV_EXPAND)
      - Parallel computation across all input samples
      - Result c has length = length(x) + length(b) - 1
      - Truncate to length(x) to match input size
   
   2. **Feedback (IIR) Stage**:
      ```
      y[n] = (c[n] - a[1]×y[n-1] - a[2]×y[n-2] - ... - a[N]×y[n-N]) / a[0]
      ```
      - **Inherently sequential**: y[n] depends on previous outputs
      - ArrayFire uses optimized recursive algorithm with local memory
      - Batch processing for multiple filter/signal combinations
   
   ### Optimization Techniques
   
   1. **Normalization by a[0]**:
      - If a[0] = 1, division is eliminated
      - If only a[0] exists (N=1), reduces to simple FIR: y = (b/a[0]) * x
   
   2. **Local Memory Utilization**:
      - **s_z**: Shared memory for intermediate feedback states (size N)
      - **s_a**: Shared memory for feedback coefficients (size N)
      - **s_y**: Shared memory for single output sample
      - Constraint: (2N + 1) × sizeof(T) ≤ local_memory_size
      - Typical: N ≤ 512 for float, N ≤ 256 for double
   
   3. **Thread Organization**:
      - **Groups**: (dim2 × dim1, dim3 × batch_size)
      - **Threads per group**: min(256, next_power_of_2(dim0))
      - **Workload**: Each thread processes sequential samples within group
      - **Synchronization**: Barrier after each feedback iteration
   
   4. **Batch Processing**:
      - **batch_a template parameter**:
        * false: Single filter applied to all signals (broadcast)
        * true: Each signal has its own filter coefficients
      - **Memory layout**: Column-major for coalesced access
      - **Performance**: 5-20× speedup for batched operations
   
   ## Filter Design Methods
   
   ### Common IIR Filter Types
   
   1. **Butterworth Filter**:
      - Maximally flat passband response
      - Monotonic stopband attenuation
      - No ripple in passband or stopband
      - Transfer function: |H(jω)| = 1 / √(1 + (ω/ωc)^(2n))
      - Design: Use analog prototype, bilinear transform to digital
   
   2. **Chebyshev Type I Filter**:
      - Equiripple passband, monotonic stopband
      - Steeper rolloff than Butterworth for same order
      - Ripple amplitude: ε parameter
      - Trade-off: Passband ripple for sharper transition
   
   3. **Chebyshev Type II Filter**:
      - Monotonic passband, equiripple stopband
      - Flatter passband than Type I
      - Stopband attenuation: minimum ripple level
   
   4. **Elliptic (Cauer) Filter**:
      - Equiripple in both passband and stopband
      - Sharpest transition for given order and ripple
      - Most complex design, highest sensitivity
      - Optimal for minimum order requirement
   
   5. **Bessel Filter**:
      - Maximally flat group delay (linear phase)
      - Preserves waveform shape
      - Poorest frequency selectivity
      - Applications: Pulse/transient preservation
   
   ### Design Process
   
   1. **Analog Prototype Design**:
      - Specify filter requirements (passband, stopband, ripple, order)
      - Design analog filter H_a(s) in s-domain
      - Compute pole/zero locations
   
   2. **Digital Transformation**:
      - **Bilinear Transform** (most common):
        ```
        s = (2/T) × (1 - z⁻¹) / (1 + z⁻¹)
        ```
        Where T = sampling period
      - Maps entire jω axis to unit circle
      - Introduces frequency warping: ω_digital = (2/T) × tan(ω_analog × T/2)
      - Pre-warp critical frequencies to compensate
   
   3. **Coefficient Extraction**:
      - Factor H(z) into rational polynomial form
      - Extract numerator coefficients → b array
      - Extract denominator coefficients → a array
      - Normalize: divide by a[0]
   
   4. **Second-Order Sections (SOS)**:
      - Cascade of biquad (second-order) filters
      - Form: H(z) = H₁(z) × H₂(z) × ... × Hₖ(z)
      - Each section: H_i(z) = (b_i0 + b_i1×z⁻¹ + b_i2×z⁻²) / (1 + a_i1×z⁻¹ + a_i2×z⁻²)
      - **Advantages**:
        * Better numerical stability
        * Reduced quantization errors
        * Each section can be scaled independently
      - **Note**: ArrayFire af_iir applies single direct-form filter; for SOS, cascade multiple af_iir calls
   
   ## Use Cases
   
   ### 1. Audio Equalization
   
   **Objective**: Adjust frequency content for musical balance or correction.
   
   **Implementation**:
   - **Parametric EQ**: Second-order sections (biquads) for each band
   - **Peaking filter**: Boost/cut at center frequency
     ```
     H(z) = (b0 + b1×z⁻¹ + b2×z⁻²) / (1 + a1×z⁻¹ + a2×z⁻²)
     ```
     Where coefficients depend on:
     * fc: Center frequency
     * Q: Quality factor (bandwidth)
     * gain: Boost/cut in dB
   - **Cascaded filters**: Bass (100Hz), midrange (1kHz), treble (10kHz)
   
   **Example**:
   ```clojure
   ;; Design parametric EQ boost at 1kHz
   (defn peaking-filter-coeffs [fs fc Q gain-db]
     (let [w0 (/ (* 2.0 Math/PI fc) fs)
           A (Math/pow 10.0 (/ gain-db 40.0))
           alpha (/ (Math/sin w0) (* 2.0 Q))
           cos-w0 (Math/cos w0)]
       {:b [(+ 1.0 (* alpha A))
            (* -2.0 cos-w0)
            (- 1.0 (* alpha A))]
        :a [1.0
            (* -2.0 cos-w0)
            (+ 1.0 (/ alpha A))]}))
   
   ;; Apply to audio signal
   (let [fs 44100                    ; Sampling rate
         fc 1000                     ; Center frequency
         Q 1.0                       ; Quality factor
         gain 6.0                    ; +6dB boost
         {:keys [b a]} (peaking-filter-coeffs fs fc Q gain)
         b-arr (af/create-array b)
         a-arr (af/create-array a)
         audio-in (af/read-audio \"input.wav\")
         audio-out (af-iir y-ptr b-arr a-arr audio-in)]
     (af/write-audio \"output.wav\" audio-out))
   ```
   
   **Performance**:
   - 44.1kHz audio: 0.1-0.5ms per 1024 samples (GPU)
   - Real-time capable: <23ms per second of audio
   - Batch processing: Process entire song in 50-200ms (3min song)
   
   ### 2. Active Noise Cancellation (ANC)
   
   **Objective**: Generate anti-noise signal to cancel unwanted ambient sound.
   
   **Algorithm**:
   1. **Reference signal**: Microphone captures ambient noise
   2. **Adaptive filter**: Estimates noise path from reference to error mic
   3. **Anti-noise**: Filtered signal inverted and played through speaker
   4. **Error signal**: Residual noise measured near ear
   5. **Coefficient update**: Minimize error signal (LMS, RLS algorithms)
   
   **IIR Filter Role**:
   - **Secondary path model**: IIR filter models speaker-to-mic transfer function
   - **Adaptive IIR**: Coefficients updated in real-time to track changing noise
   - **Low latency**: Critical for phase cancellation (<10ms group delay)
   
   **Implementation**:
   ```clojure
   ;; Simplified ANC loop (one iteration)
   (defn anc-step [ref-signal error-signal b a learning-rate]
     (let [;; Generate anti-noise
           anti-noise (af-iir y-ptr b a ref-signal)
           
           ;; Update coefficients (simplified LMS)
           ;; In practice, use more sophisticated adaptive algorithm
           b-update (af/mul error-signal ref-signal learning-rate)
           b-new (af/sub b b-update)]
       {:anti-noise anti-noise
        :b-new b-new}))
   ```
   
   **Challenges**:
   - **Stability**: Adaptive IIR can become unstable; use normalized algorithms
   - **Causality**: Filter must be causal (no future samples)
   - **Secondary path effects**: Speaker-to-error-mic path introduces phase shift
   
   **Performance**:
   - Sampling: 16-48kHz typical
   - Latency budget: <10ms end-to-end
   - GPU advantage: Batch processing for multichannel ANC (headphones)
   
   ### 3. DC Removal (High-Pass Filter)
   
   **Objective**: Remove DC offset and very low frequencies from signal.
   
   **Design**: First-order high-pass IIR filter
   ```
   H(z) = (1 - z⁻¹) / (1 - α×z⁻¹)
   ```
   Where α = exp(-2π×fc/fs), fc = cutoff frequency, fs = sampling rate.
   
   **Coefficients**:
   - b = [1, -1]
   - a = [1, -α]
   
   **Example**:
   ```clojure
   (defn dc-removal-filter [fs fc]
     (let [alpha (Math/exp (/ (* -2.0 Math/PI fc) fs))
           b (af/create-array [1.0 -1.0])
           a (af/create-array [1.0 (- alpha)])]
       {:b b :a a}))
   
   ;; Remove DC from sensor data
   (let [{:keys [b a]} (dc-removal-filter 1000.0 0.1) ; fs=1kHz, fc=0.1Hz
         sensor-data (af/create-array raw-samples)
         filtered (af-iir y-ptr b a sensor-data)]
     filtered)
   ```
   
   **Applications**:
   - **Audio processing**: Remove DC offset from microphone signals
   - **Sensor data**: Eliminate drift in accelerometers, gyroscopes
   - **AC coupling**: Simulate capacitive coupling (high-pass characteristic)
   
   **Advantages over FIR**:
   - **Efficiency**: First-order IIR vs hundreds of FIR taps for same fc
   - **Sharp rolloff**: Steeper attenuation below cutoff
   - **Low latency**: Minimal group delay (1-2 samples)
   
   ### 4. Resonance and Quality Factor (Q) Control
   
   **Objective**: Create resonant peak at specific frequency for synthesis or analysis.
   
   **Resonant Filter**: Second-order bandpass or peaking filter with high Q.
   ```
   H(z) = (b0 + b1×z⁻¹ + b2×z⁻²) / (1 + a1×z⁻¹ + a2×z⁻²)
   ```
   
   **Quality Factor Q**:
   - Q = fc / BW, where BW = bandwidth at -3dB points
   - High Q (>10): Narrow bandwidth, sharp resonance
   - Low Q (<1): Wide bandwidth, gentle slope
   
   **Example**:
   ```clojure
   (defn bandpass-filter-coeffs [fs fc Q]
     (let [w0 (/ (* 2.0 Math/PI fc) fs)
           alpha (/ (Math/sin w0) (* 2.0 Q))
           cos-w0 (Math/cos w0)]
       {:b [alpha 0.0 (- alpha)]
        :a [1.0 (* -2.0 cos-w0) (- 1.0 alpha)]}))
   
   ;; Create resonant filter bank for spectral analysis
   (defn resonant-filter-bank [fs center-freqs Q]
     (for [fc center-freqs]
       (bandpass-filter-coeffs fs fc Q)))
   
   ;; Apply filter bank in parallel
   (let [fs 44100
         freqs [100 200 400 800 1600 3200 6400] ; 7-band
         Q 10.0
         filters (resonant-filter-bank fs freqs Q)
         input-signal (af/create-array samples)]
     (for [{:keys [b a]} filters]
       (let [b-arr (af/create-array b)
             a-arr (af/create-array a)]
         (af-iir y-ptr b-arr a-arr input-signal))))
   ```
   
   **Applications**:
   - **Subtractive synthesis**: Resonant low-pass for analog-style synths
   - **Vocoders**: Filter banks for speech analysis/resynthesis
   - **Modal analysis**: Identify resonant frequencies in structures
   - **Formant synthesis**: Vocal tract resonances for speech
   
   **Performance**:
   - 7-band filter bank at 44.1kHz: 0.5-2ms per 1024 samples
   - GPU advantage: Parallel application of all filters in batch
   
   ### 5. Control Systems (PID Controller)
   
   **Objective**: Maintain setpoint in feedback control loop.
   
   **PID Controller**: Combination of proportional, integral, derivative actions.
   ```
   u(t) = Kp×e(t) + Ki×∫e(τ)dτ + Kd×(de/dt)
   ```
   
   **Discrete-Time Form** (using backward difference):
   ```
   u[n] = Kp×e[n] + Ki×Ts×Σe[k] + Kd×(e[n] - e[n-1])/Ts
   ```
   
   **IIR Representation**:
   - Can be expressed as IIR filter on error signal e[n]
   - Transfer function: H(z) = (b0 + b1×z⁻¹ + b2×z⁻²) / (1 - z⁻¹)
   
   **Example**:
   ```clojure
   (defn pid-controller-coeffs [Kp Ki Kd Ts]
     (let [b0 (+ Kp (/ (* Ki Ts) 2.0) (/ Kd Ts))
           b1 (- (* Ki Ts) (/ (* 2.0 Kd) Ts))
           b2 (- (/ Kd Ts) (/ (* Ki Ts) 2.0))]
       {:b [b0 b1 b2]
        :a [1.0 -1.0]}))
   
   ;; Temperature control loop
   (defn control-loop [setpoint measurements Kp Ki Kd Ts]
     (let [errors (af/sub setpoint measurements)
           {:keys [b a]} (pid-controller-coeffs Kp Ki Kd Ts)
           b-arr (af/create-array b)
           a-arr (af/create-array a)
           control-signal (af-iir y-ptr b-arr a-arr errors)]
       control-signal))
   ```
   
   **Applications**:
   - **Temperature control**: HVAC, ovens, chemical reactors
   - **Motor control**: Speed/position regulation
   - **Flight control**: Autopilot, stability augmentation
   - **Process control**: Industrial automation
   
   **Considerations**:
   - **Discretization method**: Backward difference, Tustin (bilinear), forward difference
   - **Anti-windup**: Prevent integral accumulation during saturation
   - **Derivative filtering**: Add low-pass to derivative term to reduce noise amplification
   
   ### 6. Biomedical Signal Filtering
   
   **Objective**: Extract physiological signals from noisy measurements.
   
   **ECG/EEG Processing**:
   - **Baseline wander removal**: High-pass IIR (0.5Hz cutoff)
   - **Powerline interference**: Notch filter at 50/60Hz
   - **EMG rejection**: Low-pass IIR (30-40Hz cutoff for ECG)
   
   **Notch Filter Design** (for 60Hz powerline):
   ```
   H(z) = (1 - 2×cos(ω0)×z⁻¹ + z⁻²) / (1 - 2×r×cos(ω0)×z⁻¹ + r²×z⁻²)
   ```
   Where:
   - ω0 = 2π×60/fs (normalized frequency)
   - r = 1 - (BW/fs) (pole radius, controls notch width)
   
   **Example**:
   ```clojure
   (defn notch-filter-coeffs [fs f0 BW]
     (let [w0 (/ (* 2.0 Math/PI f0) fs)
           r (- 1.0 (/ BW fs))
           cos-w0 (Math/cos w0)]
       {:b [1.0 (* -2.0 cos-w0) 1.0]
        :a [1.0 (* -2.0 r cos-w0) (* r r)]}))
   
   ;; ECG signal processing pipeline
   (defn process-ecg [raw-ecg fs]
     (let [;; Remove baseline wander
           {:keys [b1 a1]} (dc-removal-filter fs 0.5)
           step1 (af-iir y-ptr (af/create-array b1) (af/create-array a1) raw-ecg)
           
           ;; Remove 60Hz powerline
           {:keys [b2 a2]} (notch-filter-coeffs fs 60.0 2.0)
           step2 (af-iir y-ptr (af/create-array b2) (af/create-array a2) step1)
           
           ;; Low-pass for smoothing
           {:keys [b3 a3]} (butterworth-lpf-coeffs fs 40.0 4)
           step3 (af-iir y-ptr (af/create-array b3) (af/create-array a3) step2)]
       step3))
   ```
   
   **Real-Time Requirements**:
   - **Latency**: <50ms for monitoring (display lag)
   - **Throughput**: 250-1000 samples/second typical
   - **Batch processing**: Analyze hours of recorded data
   
   **Performance**:
   - Single-channel ECG (250Hz): <0.1ms per second of data (GPU)
   - 32-channel EEG (500Hz): 1-3ms per second of data
   
   ## Advanced Techniques
   
   ### Cascaded Second-Order Sections (SOS)
   
   **Motivation**: High-order direct-form IIR filters suffer from:
   - Coefficient quantization errors
   - Numerical instability
   - Pole/zero pairing issues
   
   **SOS Decomposition**:
   - Factor H(z) into product of biquad sections
   - Each section: 2 poles, 2 zeros (real or complex conjugate pairs)
   - Total order: 2 × number of sections
   
   **Implementation**:
   ```clojure
   (defn apply-sos [x sections]
     (reduce (fn [signal {:keys [b a]}]
               (let [b-arr (af/create-array b)
                     a-arr (af/create-array a)]
                 (af-iir y-ptr b-arr a-arr signal)))
             x
             sections))
   
   ;; 8th-order Butterworth as 4 biquads
   (let [sos-sections [{:b [b00 b01 b02] :a [1.0 a01 a02]}
                       {:b [b10 b11 b12] :a [1.0 a11 a12]}
                       {:b [b20 b21 b22] :a [1.0 a21 a22]}
                       {:b [b30 b31 b32] :a [1.0 a31 a32]}]
         input (af/create-array samples)
         output (apply-sos input sos-sections)]
     output)
   ```
   
   **Section Ordering**:
   - **Minimum gain first**: Sections with lowest gain at Nyquist frequency first
   - **Alternating**: High-gain, low-gain, high-gain, ...
   - **Pairing**: Group poles/zeros by proximity to minimize intermediate dynamic range
   
   ### Zero-Phase Filtering (Forward-Backward)
   
   **Objective**: Eliminate phase distortion introduced by IIR filter.
   
   **Method**:
   1. Filter signal forward: y1 = iir(b, a, x)
   2. Reverse y1: y1_rev
   3. Filter reversed signal: y2 = iir(b, a, y1_rev)
   4. Reverse y2: y_final = reverse(y2)
   
   **Result**:
   - Zero phase shift: φ(ω) = 0 for all frequencies
   - Doubled magnitude response: |H_total(ω)|² = |H(ω)|²
   - Doubled filter order
   
   **Implementation**:
   ```clojure
   (defn zero-phase-filter [b a x]
     (let [b-arr (af/create-array b)
           a-arr (af/create-array a)
           ;; Forward pass
           y1 (af-iir y1-ptr b-arr a-arr x)
           ;; Reverse
           y1-rev (af/flip y1 0)
           ;; Backward pass
           y2 (af-iir y2-ptr b-arr a-arr y1-rev)
           ;; Reverse again
           y-final (af/flip y2 0)]
       y-final))
   ```
   
   **Use Cases**:
   - **Offline processing**: When real-time not required (recorded data)
   - **Image filtering**: 2D zero-phase filters for edge preservation
   - **Feature extraction**: Phase distortion affects features
   
   **Caveat**: Not causal; requires full signal available.
   
   ### Frequency Sampling Method
   
   **Design Goal**: Specify desired frequency response H_d(ω), derive coefficients.
   
   **Steps**:
   1. Sample H_d(ω) at N equally spaced frequencies: H_d[k], k=0..N-1
   2. Compute inverse DFT: h[n] = IDFT{H_d[k]}
   3. For IIR, fit rational polynomial to H_d(ω) using optimization
   4. Extract b, a coefficients from polynomial
   
   **Optimization**:
   - **Least squares**: Minimize Σ|H(ωk) - H_d(ωk)|²
   - **Weighted**: Emphasize critical frequency bands
   - **Iterative refinement**: Prony, Steiglitz-McBride methods
   
   **Example** (conceptual):
   ```clojure
   (defn design-iir-from-response [freq-response order]
     ;; freq-response: vector of desired H(ω) at N frequencies
     ;; order: [M N] = [numerator-order denominator-order]
     ;; Returns {:b [...] :a [...]}
     (let [;; Optimization problem: find b, a that minimize error
           ;; This is typically done using specialized DSP libraries
           ;; Placeholder for actual implementation
           {:keys [b a]} (optimize-iir-coeffs freq-response order)]
       {:b b :a a}))
   ```
   
   **Applications**:
   - **Arbitrary responses**: Custom magnitude/phase shapes
   - **Inverse modeling**: Design inverse of measured system response
   - **Digital audio effects**: Resonators, phasers, flangers
   
   ## Performance Characteristics
   
   ### Computational Complexity
   
   **Per-Sample Cost**:
   - **FIR stage** (convolution): O(M), where M = length(b)
   - **IIR stage** (recursion): O(N), where N = length(a)
   - **Total**: O(M + N) per output sample
   
   **ArrayFire Parallelism**:
   - **FIR stage**: Fully parallel across all samples (GPU optimized)
   - **IIR stage**: Sequential dependency chain, but:
     * Parallel across batch dimension (multiple signals)
     * Parallel across dimensions 1, 2, 3 (channels, rows, batches)
     * Optimized local memory usage
   
   **Performance Metrics**:
   
   | Input Size | Filter Order (M+N) | CPU Time | GPU Time | Speedup |
   |------------|--------------------|----------|----------|---------|
   | 1K samples | 10 taps            | 0.5ms    | 0.05ms   | 10×     |
   | 10K        | 10 taps            | 5ms      | 0.2ms    | 25×     |
   | 100K       | 10 taps            | 50ms     | 1ms      | 50×     |
   | 1K         | 100 taps           | 2ms      | 0.1ms    | 20×     |
   | 10K        | 100 taps           | 20ms     | 0.5ms    | 40×     |
   
   **Batch Performance** (100 signals, 10K samples each, 10 taps):
   
   | Mode           | CPU Time | GPU Time | Speedup |
   |----------------|----------|----------|---------|
   | Sequential     | 500ms    | 20ms     | 25×     |
   | Batch (same b,a)| 500ms   | 5ms      | 100×    |
   | Batch (diff b,a)| 500ms   | 8ms      | 62×     |
   
   ### Memory Requirements
   
   **Device Memory**:
   - **Input signal x**: W × H × D × B × sizeof(T) bytes
   - **Output signal y**: Same as x
   - **Feedforward result c**: Same as x
   - **Coefficient b**: M × H × D × B × sizeof(T) (if batched)
   - **Coefficient a**: N × H × D × B × sizeof(T) (if batched)
   - **Temporary**: Convolution workspace (~2× input size)
   
   **Local Memory** (per work-group):
   - **s_z**: N × sizeof(T) (feedback states)
   - **s_a**: N × sizeof(T) (feedback coefficients)
   - **s_y**: 1 × sizeof(T) (current output)
   - **Total**: (2N + 1) × sizeof(T)
   
   **Constraint**: (2N + 1) × sizeof(T) ≤ local_memory_size
   - **CUDA/OpenCL**: Typically 48KB local memory
   - **float**: N ≤ 12,287 (practically limited to ~512)
   - **double**: N ≤ 6,143 (practically limited to ~256)
   
   **Example** (10K samples, float, 10 taps):
   ```
   x:         10,000 × 4 = 40KB
   y:         10,000 × 4 = 40KB
   c:         10,000 × 4 = 40KB
   b:         10 × 4 = 40 bytes
   a:         10 × 4 = 40 bytes
   temporary: ~80KB
   local:     21 × 4 = 84 bytes per group
   Total:     ~200KB device, 84 bytes local
   ```
   
   ## Optimization Strategies
   
   ### 1. Coefficient Pre-Normalization
   
   Normalize coefficients before passing to af_iir:
   ```clojure
   (defn normalize-coeffs [b a]
     (let [a0 (first a)
           b-norm (mapv #(/ % a0) b)
           a-norm (mapv #(/ % a0) a)]
       {:b b-norm :a a-norm}))
   
   ;; Usage
   (let [{:keys [b a]} (normalize-coeffs raw-b raw-a)
         b-arr (af/create-array b)
         a-arr (af/create-array a)]
     (af-iir y-ptr b-arr a-arr x))
   ```
   
   **Benefits**:
   - Eliminates division by a[0] in inner loop
   - Improves numerical stability
   - First element of a-norm is exactly 1.0
   
   ### 2. Batch Similar Filters
   
   Group signals with identical filter coefficients:
   ```clojure
   (defn batch-filter [signals b a]
     ;; Concatenate signals along batch dimension
     (let [batched (af/join 3 signals)
           b-arr (af/create-array b)
           a-arr (af/create-array a)
           result (af-iir y-ptr b-arr a-arr batched)]
       ;; Split back into individual signals
       (map #(af/slice result 3 %) (range (count signals)))))
   ```
   
   **Speedup**: 5-10× for large batches (>100 signals).
   
   ### 3. Minimize Filter Order
   
   Use lowest-order filter that meets requirements:
   - **Rule of thumb**: Order ≈ -20 × log10(δ) / (transition_width/fs)
     Where δ = passband/stopband ripple
   - **SOS cascade**: More efficient than high-order direct form
   - **Frequency domain**: Consider FFT-based methods for very long filters
   
   ### 4. Pipeline Multiple Filters
   
   Chain filters to avoid intermediate host transfers:
   ```clojure
   (defn filter-pipeline [x filters]
     (reduce (fn [signal {:keys [b a]}]
               (let [b-arr (af/create-array b)
                     a-arr (af/create-array a)]
                 (af-iir y-ptr b-arr a-arr signal)))
             x
             filters))
   ```
   
   **Benefit**: All computations stay on device, minimizing PCIe overhead.
   
   ### 5. Exploit Symmetry
   
   For linear-phase requirement, use zero-phase filtering (forward-backward) to
   avoid FIR with very high order.
   
   ## Comparison with FIR Filters
   
   | Aspect                | IIR Filter              | FIR Filter              |
   |-----------------------|-------------------------|-------------------------|
   | **Phase Response**    | Nonlinear (except special) | Can be linear (symmetric) |
   | **Stability**         | Can be unstable         | Always stable           |
   | **Order Required**    | Lower for same specs    | Higher for sharp cutoff |
   | **Computation**       | O(M+N) per sample       | O(M) per sample         |
   | **Memory**            | Feedback states needed  | Only input delay line   |
   | **Design Complexity** | More complex (poles/zeros) | Simpler (window/freq sampling) |
   | **Quantization**      | Sensitive to coeff errors | Less sensitive          |
   | **Feedback**          | Uses past outputs       | Only uses past inputs   |
   | **Applications**      | Audio EQ, control, adaptive | Linear phase, decimation |
   
   **When to Use IIR**:
   - Sharp frequency selectivity with low order
   - Computational efficiency critical
   - Phase distortion acceptable (or correctable)
   - Audio/control applications (EQ, filters, PID)
   
   **When to Use FIR**:
   - Linear phase mandatory (audio mastering, measurement)
   - Guaranteed stability essential
   - Quantization sensitivity unacceptable
   - Decimation/interpolation (polyphase structures)
   
   ## Integration Patterns
   
   ### Pattern 1: Audio Processing Chain
   
   ```clojure
   (defn audio-processing-chain [input-audio fs]
     (let [;; DC removal
           {:keys [b1 a1]} (dc-removal-filter fs 0.5)
           step1 (af-iir y-ptr (af/create-array b1) (af/create-array a1) input-audio)
           
           ;; Bass boost (low-shelf at 100Hz, +3dB)
           {:keys [b2 a2]} (low-shelf-coeffs fs 100.0 1.0 3.0)
           step2 (af-iir y-ptr (af/create-array b2) (af/create-array a2) step1)
           
           ;; Treble cut (high-shelf at 8kHz, -2dB)
           {:keys [b3 a3]} (high-shelf-coeffs fs 8000.0 1.0 -2.0)
           step3 (af-iir y-ptr (af/create-array b3) (af/create-array a3) step2)
           
           ;; Normalize amplitude
           normalized (af/div step3 (af/max step3))]
       normalized))
   ```
   
   ### Pattern 2: Real-Time Sensor Filtering
   
   ```clojure
   (defn sensor-filter-loop [sensor-stream b a buffer-size]
     (let [b-arr (af/create-array b)
           a-arr (af/create-array a)]
       (loop [samples (sensor-stream/read buffer-size)]
         (when samples
           (let [af-samples (af/create-array samples)
                 filtered (af-iir y-ptr b-arr a-arr af-samples)
                 host-result (af/get-data filtered)]
             ;; Process filtered samples
             (process-samples host-result)
             ;; Continue loop
             (recur (sensor-stream/read buffer-size)))))))
   ```
   
   ### Pattern 3: Multichannel Batch Processing
   
   ```clojure
   (defn process-multichannel [channels filters]
     ;; channels: vector of af_arrays
     ;; filters: vector of {:b [...] :a [...]} for each channel
     (let [;; Stack channels into single array (batch dimension)
           batched (apply af/join 3 channels)]
       (if (every? #(= (:b (first filters)) (:b %)) filters)
         ;; All filters identical, use single broadcast
         (let [b-arr (af/create-array (:b (first filters)))
               a-arr (af/create-array (:a (first filters)))
               result (af-iir y-ptr b-arr a-arr batched)]
           (map #(af/slice result 3 %) (range (count channels))))
         ;; Different filters per channel, stack coefficients
         (let [b-batch (apply af/join 1 (map #(af/create-array (:b %)) filters))
               a-batch (apply af/join 1 (map #(af/create-array (:a %)) filters))
               result (af-iir y-ptr b-batch a-batch batched)]
           (map #(af/slice result 3 %) (range (count channels)))))))
   ```
   
   ## Error Handling
   
   ### Common Error Codes
   
   - **AF_ERR_ARG**: Invalid arguments
     * Mismatched b, a, x array types
     * b and a have inconsistent ndims (when batched)
     * x dimensions incompatible with b dimensions
   
   - **AF_ERR_SIZE**: Size mismatch
     * When batching: b.dims[1:3] ≠ x.dims[1:3]
     * Filter coefficients too long (N > 512 for float)
   
   - **AF_ERR_TYPE**: Unsupported data type
     * Only f32, f64, c32, c64 supported
     * Mixed types between b, a, x
   
   - **AF_ERR_RUNTIME**: Runtime error
     * Insufficient device memory
     * Local memory exceeded: (2N+1)×sizeof(T) > local_mem_size
   
   - **AF_ERR_NOT_SUPPORTED**: Feature not available
     * oneAPI backend: IIR not implemented
   
   ### Debugging Strategies
   
   1. **Check Coefficient Validity**:
      ```clojure
      (defn validate-coeffs [a]
        ;; Check stability: poles inside unit circle
        (let [poles (compute-poles a)] ; Use external DSP library
          (every? #(< (abs %) 1.0) poles)))
      ```
   
   2. **Monitor Filter Response**:
      ```clojure
      (defn plot-frequency-response [b a fs]
        (let [freqs (range 0 (/ fs 2) 1)
              response (map #(eval-transfer-function b a % fs) freqs)]
          (plot freqs (map abs response))))
      ```
   
   3. **Verify Numerical Stability**:
      - Impulse response should decay to zero
      - Step response should converge
      - Frequency response should be smooth
   
   4. **Check for Overflow/Underflow**:
      - Use f64 if f32 shows precision issues
      - Scale input signal to avoid saturation
      - Monitor output amplitude
   
   ## Related Functions
   
   - **af_fir**: Finite impulse response filter (non-recursive)
   - **af_convolve1**: 1D convolution (used internally for FIR stage)
   - **af_fft**: FFT-based filtering alternative for long filters
   - **af_medfilt**: Median filter (nonlinear, edge-preserving)
   - **af_bilateral**: Bilateral filter (edge-preserving smoothing)
   
   ## References
   
   1. Oppenheim, A. V., & Schafer, R. W. (2009). *Discrete-Time Signal Processing* (3rd ed.). Prentice Hall.
      - Chapter 6: Structures for Discrete-Time Systems
      - Chapter 7: Filter Design Techniques
   
   2. Proakis, J. G., & Manolakis, D. G. (2006). *Digital Signal Processing* (4th ed.). Prentice Hall.
      - Chapter 8: Design of Digital Filters
   
   3. Lyons, R. G. (2011). *Understanding Digital Signal Processing* (3rd ed.). Prentice Hall.
      - Chapter 6: Digital Filters
   
   4. Parks, T. W., & Burrus, C. S. (1987). *Digital Filter Design*. Wiley.
      - IIR filter design methods
   
   5. Zölzer, U. (2011). *DAFX: Digital Audio Effects* (2nd ed.). Wiley.
      - Audio EQ and effects using IIR filters
   
   6. ArrayFire Documentation: https://arrayfire.com/docs/signal.htm
      - Official IIR filter documentation"
  (:require [coffi.ffi :as ffi :refer [defcfn]]
            [coffi.mem :as mem]
            [org.soulspace.arrayfire.ffi.loader]))

;; af_err af_iir(af_array *y, const af_array b, const af_array a, const af_array x)
(defcfn af-iir
  "Apply an infinite impulse response (IIR) filter to a signal.
   
   An IIR filter implements the difference equation:
   
     y[n] = (1/a[0]) × (b[0]×x[n] + b[1]×x[n-1] + ... + b[M]×x[n-M]
                        - a[1]×y[n-1] - a[2]×y[n-2] - ... - a[N]×y[n-N])
   
   Where:
   - b: feedforward (numerator) coefficients
   - a: feedback (denominator) coefficients
   - x: input signal
   - y: output signal
   
   The filter is applied in two stages:
   1. Feedforward (FIR): Convolve x with b
   2. Feedback (IIR): Apply recursive feedback using a
   
   Parameters:
   - y: out pointer to filtered signal array
   - b: feedforward coefficient array (length M+1)
        * 1D array for single filter
        * 2D/3D/4D for batched filters (one filter per column/plane)
        * Supported types: f32, f64, c32 (cfloat), c64 (cdouble)
   - a: feedback coefficient array (length N+1)
        * Same dimensionality rules as b
        * a[0] is the normalization factor (typically 1.0)
        * If only a[0] provided (length 1), reduces to FIR: y = (b/a[0]) * x
        * Supported types: Must match b type
   - x: input signal array
        * 1D: Single signal, length L
        * 2D: Multiple signals (batch along columns)
        * 3D/4D: Additional batch dimensions
        * Type must match b and a
   
   Batch Processing:
   - If x.ndims == 1 and b.ndims > 1: Apply each filter in b to x
   - If x.ndims > 1 and b.ndims == 1: Apply single filter to all columns of x
   - If x.ndims == b.ndims: Element-wise (each column of x with corresponding filter in b)
   - Batch dimensions: x.dims[1:3] must match b.dims[1:3] when batching
   
   Output:
   - y has same dimensions as x
   - Length preserved: length(y) = length(x)
   - First M samples affected by zero-padding of x history
   
   Constraints:
   - Feedforward order M: Limited by device memory (typically M ≤ 512)
   - Feedback order N: Limited by local memory, (2N+1)×sizeof(T) ≤ local_mem_size
     * float: N ≤ ~512 (practical limit)
     * double: N ≤ ~256 (practical limit)
   - All arrays must have same data type (f32, f64, c32, c64)
   - b.ndims must equal a.ndims (both 1D or same batch structure)
   
   Stability:
   - Filter is stable if all poles (roots of denominator) lie inside unit circle
   - Unstable filters produce unbounded outputs
   - Use second-order sections (SOS) for high-order filters to improve stability
   
   Special Cases:
   - If a = [1.0], filter reduces to FIR (no feedback)
   - If b = [1.0], filter is pure IIR (all-pole)
   - If a[0] ≠ 1.0, coefficients are normalized by a[0] internally
   
   Performance:
   - FIR stage: Fully parallel convolution, O(M×L) complexity
   - IIR stage: Sequential recursion, O(N×L) complexity
   - GPU acceleration: 10-50× speedup over CPU for typical filter orders
   - Batch mode: Additional 5-10× speedup when processing multiple signals
   
   Examples:
   
   1. Simple low-pass filter:
      ```clojure
      (let [b (af/create-array [0.25 0.5 0.25])  ; Feedforward
            a (af/create-array [1.0])              ; No feedback (FIR)
            x (af/randu 1000)                      ; Random signal
            y-ptr (mem/alloc-pointer)]
        (af-iir y-ptr b a x)
        (let [y (mem/deref-pointer y-ptr)]
          ;; y contains smoothed signal
          ))
      ```
   
   2. DC removal filter (high-pass):
      ```clojure
      (let [alpha 0.95
            b (af/create-array [1.0 -1.0])         ; Differentiator
            a (af/create-array [1.0 (- alpha)])    ; Feedback
            x (af/create-array sensor-data)
            y-ptr (mem/alloc-pointer)]
        (af-iir y-ptr b a x)
        ;; Removes DC offset and low frequencies
        )
      ```
   
   3. Batch filtering (multiple signals, same filter):
      ```clojure
      (let [b (af/create-array [0.1 0.2 0.4 0.2 0.1])
            a (af/create-array [1.0 -0.5])
            x (af/randu 1000 10)  ; 10 signals, each 1000 samples
            y-ptr (mem/alloc-pointer)]
        (af-iir y-ptr b a x)
        ;; All 10 signals filtered in parallel
        )
      ```
   
   4. Resonant bandpass filter:
      ```clojure
      (let [fc 1000.0    ; Center frequency (Hz)
            fs 44100.0   ; Sampling rate (Hz)
            Q 10.0       ; Quality factor
            w0 (/ (* 2.0 Math/PI fc) fs)
            alpha (/ (Math/sin w0) (* 2.0 Q))
            b (af/create-array [alpha 0.0 (- alpha)])
            a (af/create-array [1.0 (* -2.0 (Math/cos w0)) (- 1.0 alpha)])
            x (af/create-array audio-samples)
            y-ptr (mem/alloc-pointer)]
        (af-iir y-ptr b a x)
        ;; Narrow resonance at 1kHz
        )
      ```
   
   5. Cascaded second-order sections (SOS):
      ```clojure
      (defn apply-sos [x sections]
        (reduce (fn [signal {:keys [b a]}]
                  (let [y-ptr (mem/alloc-pointer)]
                    (af-iir y-ptr b a signal)
                    (mem/deref-pointer y-ptr)))
                x
                sections))
      
      (let [;; 8th-order Butterworth as 4 biquads
            sos [{:b [b00 b01 b02] :a [1.0 a01 a02]}
                 {:b [b10 b11 b12] :a [1.0 a11 a12]}
                 {:b [b20 b21 b22] :a [1.0 a21 a22]}
                 {:b [b30 b31 b32] :a [1.0 a31 a32]}]
            x (af/create-array signal-data)
            y (apply-sos x sos)]
        y)
      ```
   
   Returns:
   AF_SUCCESS (0) on success, error code otherwise:
   - AF_ERR_ARG: Invalid arguments (type mismatch, incompatible dimensions)
   - AF_ERR_SIZE: Dimension mismatch or filter too long
   - AF_ERR_TYPE: Unsupported data type (must be f32, f64, c32, c64)
   - AF_ERR_RUNTIME: Insufficient memory or local memory exceeded
   - AF_ERR_NOT_SUPPORTED: Backend doesn't support IIR (oneAPI)
   
   Note:
   - For linear phase response, use FIR filter (af_fir) instead
   - For very long filters (M > 512), consider FFT-based convolution
   - Always verify filter stability before deployment
   - Use normalized coefficients (a[0] = 1.0) for best performance
   - For high-order filters, use second-order sections (SOS) cascade
   
   See also:
   - af_fir: Finite impulse response filter (non-recursive)
   - af_convolve1: 1D convolution (FIR implementation)
   - af_fft: FFT for frequency-domain filtering"
  "af_iir" [::mem/pointer ::mem/pointer ::mem/pointer ::mem/pointer] ::mem/int)
