Performance Considerations

This guide covers performance optimization techniques for OnlineStatsChains.jl.

Algorithmic Complexity

Core Operations

OnlineStatsChains is designed for efficiency:

OperationComplexityDescription
Cycle DetectionO(V + E)DFS-based algorithm
Topological SortO(V + E)Kahn's algorithm
PropagationO(V + E)Per update in eager mode
Add NodeO(1)Amortized constant time
Connect NodesO(V + E)Includes cycle check

Where:

  • V = number of vertices (nodes)
  • E = number of edges (connections)

Memory Usage

Memory overhead per node is minimal:

  • Node struct: ~100 bytes
  • Topological cache: O(V)
  • Dirty set (lazy mode): O(V)

For large graphs (1000+ nodes), memory footprint remains manageable.

Choosing Evaluation Strategies

Eager Evaluation

Best for:

  • Real-time processing
  • Small to medium DAGs
  • Streaming data
  • When all nodes need updates

Characteristics:

  • Immediate propagation
  • No deferred computation
  • Predictable latency
  • Higher throughput for dense updates
dag = StatDAG(strategy=:eager)
# Good for: streaming sensors, real-time dashboards

Lazy Evaluation

Best for:

  • Large DAGs
  • Sparse queries
  • Batch processing
  • When only some results are needed

Characteristics:

  • Deferred computation
  • Compute only what's requested
  • Lower memory pressure
  • Better for selective access
dag = StatDAG(strategy=:lazy)
# Good for: large analytics pipelines, selective queries

Partial Evaluation

Best for:

  • Large DAGs with independent branches
  • When updates affect limited subgraphs
  • Avoiding unnecessary recomputation

Characteristics:

  • Selective propagation
  • Optimized for sparse updates
  • Reduced computation for unchanged branches
dag = StatDAG(strategy=:partial)
# Good for: multi-branch analytics, parallel streams

Performance Tips

1. Batch Updates

Process data in batches when possible:

# Slow - many small updates
for x in data
    fit!(dag, :source => x)
end

# Fast - batch update
fit!(dag, :source => data)

Speedup: 10-100x for large datasets

2. Minimize Connections

Fewer edges = faster propagation:

# Slower - deeply nested chain
connect!(dag, :a, :b)
connect!(dag, :b, :c)
connect!(dag, :c, :d)
# ... many levels

# Faster - flatter structure when possible
connect!(dag, :input, :output1)
connect!(dag, :input, :output2)

3. Use Lazy for Selective Access

When querying only some nodes:

# Eager: computes all 100 nodes
dag = StatDAG(strategy=:eager)
for i in 1:100
    add_node!(dag, Symbol("n", i), Mean())
    i > 1 && connect!(dag, Symbol("n", i-1), Symbol("n", i))
end
fit!(dag, :n1 => data)  # Updates all 100

# Lazy: computes only what's needed
dag = StatDAG(strategy=:lazy)
# ... same structure ...
fit!(dag, :n1 => data)  # No propagation
value(dag, :n100)  # Computes only path to n100

4. Reuse DAGs

Creating DAGs has overhead - reuse when possible:

# Initialize once
dag = create_analysis_pipeline()

# Reuse many times
for batch in data_stream
    fit!(dag, :input => batch)
    results = value(dag, :output)
end

5. Pre-allocate for Known Patterns

Build DAG structure before data processing:

# Good: structure first, data later
dag = StatDAG()
build_dag_structure!(dag)

for data in stream
    fit!(dag, :input => data)
end

# Avoid: mixing structure changes with data
for data in stream
    add_node!(dag, random_name(), Mean())  # Bad!
    fit!(dag, :input => data)
end

Benchmarking

Measuring Performance

using BenchmarkTools

# Create DAG
dag = StatDAG()
add_node!(dag, :source, Mean())
add_node!(dag, :sink, Mean())
connect!(dag, :source, :sink)

data = randn(1000)

# Benchmark
@benchmark fit!($dag, :source => $data)

Comparing Strategies

using BenchmarkTools

function benchmark_strategies(n_nodes::Int, data_size::Int)
    data = randn(data_size)

    for strategy in [:eager, :lazy, :partial]
        dag = StatDAG(strategy=strategy)

        # Build chain
        for i in 1:n_nodes
            add_node!(dag, Symbol("n", i), Mean())
            i > 1 && connect!(dag, Symbol("n", i-1), Symbol("n", i))
        end

        # Benchmark
        println("Strategy: $strategy")
        @btime fit!($dag, :n1 => $data)
        @btime value($dag, Symbol("n", $n_nodes))
    end
end

benchmark_strategies(10, 1000)

Memory Optimization

1. Clear Caches Periodically

For long-running processes:

# After processing many batches
dag = StatDAG()
# ... process lots of data ...

# Manually trigger garbage collection if needed
GC.gc()

2. Lazy for Memory-Constrained Environments

Lazy evaluation reduces memory pressure:

# Eager: all nodes cache values
dag_eager = StatDAG(strategy=:eager)

# Lazy: only computed nodes cache
dag_lazy = StatDAG(strategy=:lazy)
fit!(dag_lazy, :input => data)
# Memory used: only :input and its stat

3. Streaming for Large Datasets

Process in chunks to control memory:

dag = StatDAG()
add_node!(dag, :processor, Mean())

chunk_size = 1000
for chunk in Iterators.partition(huge_dataset, chunk_size)
    fit!(dag, :processor => collect(chunk))
end

Profiling

Identify bottlenecks:

using Profile

dag = create_complex_dag()
data = generate_test_data()

@profile for _ in 1:100
    fit!(dag, :input => data)
end

Profile.print()

Common Performance Pitfalls

❌ Avoid

# 1. Many small DAG modifications
for data_point in stream
    dag = StatDAG()  # Don't recreate!
    add_node!(dag, :temp, Mean())
    fit!(dag, :temp => data_point)
end

# 2. Unnecessary value() calls in loops
for data in batches
    fit!(dag, :input => data)
    v = value(dag, :output)  # Don't query every iteration
end

# 3. Deep chains when parallel structure works
# a → b → c → d → e  (slow)
# Better: a → [b, c, d, e]  (fast)

✅ Do

# 1. Reuse DAG structure
dag = StatDAG()
add_node!(dag, :processor, Mean())

for data_point in stream
    fit!(dag, :processor => data_point)
end

# 2. Batch queries
for data in batches
    fit!(dag, :input => data)
end
result = value(dag, :output)  # Query once

# 3. Use appropriate structure
dag = StatDAG()
add_node!(dag, :input, Mean())
for i in 1:n
    add_node!(dag, Symbol("out", i), Mean())
    connect!(dag, :input, Symbol("out", i))
end

Scaling to Large Graphs

Handling 1000+ Nodes

# Lazy evaluation is crucial
dag = StatDAG(strategy=:lazy)

# Batch node creation
function build_large_dag(n::Int)
    dag = StatDAG(strategy=:lazy)

    # Add nodes efficiently
    for i in 1:n
        add_node!(dag, Symbol("n", i), Mean())
    end

    # Add connections
    for i in 1:n-1
        connect!(dag, Symbol("n", i), Symbol("n", i+1))
    end

    return dag
end

large_dag = build_large_dag(5000)

Parallel Processing (Future)

Current version is single-threaded. Future versions may support:

  • Parallel branch execution
  • Multi-threaded propagation
  • Distributed computing

Optimization Checklist

  • [ ] Use appropriate evaluation strategy
  • [ ] Batch data updates when possible
  • [ ] Reuse DAG structures
  • [ ] Minimize graph depth
  • [ ] Profile before optimizing
  • [ ] Consider memory vs. speed tradeoffs
  • [ ] Use lazy for large selective queries
  • [ ] Pre-build DAG structure

Next Steps