User Guide
This comprehensive guide covers all aspects of using OnlineResamplers.jl for financial market data processing.
Table of Contents
- Installation
- Core Concepts
- Basic Usage
- Advanced Features
- Real-World Examples
- Performance Optimization
- Integration with OnlineStats
- Troubleshooting
Installation
using Pkg
Pkg.add(url="https://github.com/femtotrader/OnlineResamplers.jl")
Development Installation
using Pkg
Pkg.develop(url="https://github.com/femtotrader/OnlineResamplers.jl")
Pkg.test("OnlineResamplers")
Core Concepts
Market Data Structure
Market data is represented using the MarketDataPoint{T,P,V}
structure, which provides type safety and flexibility:
using OnlineResamplers, Dates
# Basic usage with default types (DateTime, Float64, Float64)
data = MarketDataPoint(DateTime(2024, 1, 1, 9, 30, 0), 100.50, 1000.0)
# Explicit type construction
data_explicit = MarketDataPoint{DateTime, Float64, Float64}(
DateTime(2024, 1, 1, 9, 30, 0),
100.50,
1000.0
)
# Custom types for high precision
using FixedPointDecimals
precise_data = MarketDataPoint{DateTime, FixedDecimal{Int64,4}, FixedDecimal{Int64,2}}(
DateTime(2024, 1, 1, 9, 30, 0),
FixedDecimal{Int64,4}(100.5012),
FixedDecimal{Int64,2}(1000.50)
)
Time Windows
Data is aggregated into time windows defined by start time and period. Understanding time windows is crucial for effective resampling:
using Dates
# Create a 5-minute window
window = TimeWindow{DateTime}(DateTime(2024, 1, 1, 9, 30, 0), Minute(5))
# The window includes data from [start_time, start_time + period)
println("Window start: $(window.start_time)") # 2024-01-01T09:30:00
println("Window end: $(window_end(window))") # 2024-01-01T09:35:00
# Check if timestamps belong to window
test_times = [
DateTime(2024, 1, 1, 9, 29, 59), # Before window -> false
DateTime(2024, 1, 1, 9, 30, 0), # Start of window -> true
DateTime(2024, 1, 1, 9, 32, 30), # Middle of window -> true
DateTime(2024, 1, 1, 9, 35, 0) # Next window -> false
]
for ts in test_times
belongs = belongs_to_window(ts, window)
println("$(ts): $(belongs)")
end
Basic Usage
OHLC Resampling
OHLC (Open, High, Low, Close) resampling is perfect for candlestick charts and technical analysis:
using OnlineResamplers, OnlineStatsBase, Dates
# Create OHLC resampler (this is the default)
ohlc_resampler = MarketResampler(Minute(1), price_method=:ohlc)
# Sample market data within one minute
base_time = DateTime(2024, 1, 1, 14, 30, 0)
market_data = [
MarketDataPoint(base_time + Second(0), 100.00, 1000.0), # Open
MarketDataPoint(base_time + Second(15), 102.50, 800.0), # High point
MarketDataPoint(base_time + Second(30), 97.75, 1200.0), # Low point
MarketDataPoint(base_time + Second(45), 101.25, 900.0) # Close
]
# Process all data points
for data in market_data
fit!(ohlc_resampler, data)
end
# Extract results
result = value(ohlc_resampler)
ohlc = result.price.ohlc
println("Open: $(ohlc.open)") # 100.00 (first price)
println("High: $(ohlc.high)") # 102.50 (highest price)
println("Low: $(ohlc.low)") # 97.75 (lowest price)
println("Close: $(ohlc.close)") # 101.25 (last price)
println("Volume: $(result.volume)") # 3900.0 (total volume)
Mean Price Resampling
For applications requiring smoothed price data or when you need average prices over time intervals:
# Create mean price resampler
mean_resampler = MarketResampler(Minute(5), price_method=:mean)
# Process the same data
for data in market_data
fit!(mean_resampler, data)
end
result = value(mean_resampler)
mean_price = result.price.mean_price
println("Mean Price: $(mean_price)") # 100.375 ((100+102.5+97.75+101.25)/4)
println("Volume: $(result.volume)") # 3900.0
Advanced Features
Custom Numeric Types
OnlineResamplers fully supports custom numeric types commonly used in financial applications:
using FixedPointDecimals, NanoDates
# Define high-precision types
PriceType = FixedDecimal{Int128, 8} # 8 decimal places for prices
VolumeType = FixedDecimal{Int64, 2} # 2 decimal places for volumes
# Create high-precision resampler
precision_resampler = MarketResampler{NanoDate, PriceType, VolumeType}(
Nanosecond(1_000_000_000), # 1 second intervals
price_method=:ohlc
)
# Create high-precision market data
nano_data = MarketDataPoint{NanoDate, PriceType, VolumeType}(
NanoDate(2024, 1, 1, 9, 30, 0, 123456789),
PriceType(100.12345678),
VolumeType(1000.50)
)
fit!(precision_resampler, nano_data)
result = value(precision_resampler)
println("High-precision OHLC: $(result.price.ohlc)")
println("High-precision Volume: $(result.volume)")
Parallel Processing
OnlineResamplers supports efficient merging for parallel data processing:
# Function to process a chunk of data
function process_chunk(data_chunk::Vector, period::Period)
chunk_resampler = OHLCResampler{DateTime, Float64, Float64}(period)
for data in data_chunk
fit!(chunk_resampler, data)
end
return chunk_resampler
end
# Generate large dataset
large_dataset = [
MarketDataPoint(DateTime(2024, 1, 1, 9, 0, i), 100.0 + sin(i/100), rand(500:1500))
for i in 1:10000
]
# Split into chunks for parallel processing
chunk_size = 2500
chunks = [large_dataset[i:min(i+chunk_size-1, end)] for i in 1:chunk_size:length(large_dataset)]
# Process chunks (in real applications, use @distributed or threading)
chunk_resamplers = [process_chunk(chunk, Minute(1)) for chunk in chunks]
# Merge all results
final_resampler = chunk_resamplers[1]
for i in 2:length(chunk_resamplers)
merge!(final_resampler, chunk_resamplers[i])
end
merged_result = value(final_resampler)
println("Merged OHLC: $(merged_result.ohlc)")
println("Total observations: $(nobs(final_resampler))")
Individual Resamplers
For specialized use cases, you can use individual resampler types directly:
# Pure OHLC resampler
ohlc_only = OHLCResampler{DateTime, Float64, Float64}(Minute(1))
# Mean price resampler
mean_only = MeanResampler{DateTime, Float64, Float64}(Minute(5))
# Sum resampler (for volume or other additive metrics)
volume_sum = SumResampler{DateTime, Float64, Float64}(Second(30))
# Process sample data
sample_data = MarketDataPoint(DateTime(2024, 1, 1, 10, 0, 0), 100.0, 1000.0)
fit!(ohlc_only, sample_data)
fit!(mean_only, sample_data)
fit!(volume_sum, sample_data)
# Get individual results
ohlc_result = value(ohlc_only)
mean_result = value(mean_only)
volume_result = value(volume_sum)
println("OHLC only: $(ohlc_result)")
println("Mean only: $(mean_result)")
println("Volume sum: $(volume_result)")
Real-World Examples
Processing CSV Market Data
Here's a complete example processing market data from a CSV file:
using OnlineResamplers, OnlineStatsBase, Dates, CSV, DataFrames
# Load tick data from CSV file
tick_data = CSV.read("market_ticks.csv", DataFrame)
# Create 1-minute OHLC resampler
resampler = MarketResampler(Minute(1))
# Storage for completed OHLC bars
ohlc_bars = []
current_window = nothing
# Process each tick
for row in eachrow(tick_data)
# Create market data point
data_point = MarketDataPoint(
DateTime(row.timestamp),
row.price,
row.volume
)
# Get current window before processing
old_window = value(resampler).window
# Process the data
fit!(resampler, data_point)
# Check if we moved to a new window (completed a bar)
new_result = value(resampler)
if new_result.window != old_window && old_window !== nothing
# We completed a window, save the OHLC bar
old_result = # You'll need to store this before processing new data
push!(ohlc_bars, (
timestamp = old_window.start_time,
open = old_result.price.ohlc.open,
high = old_result.price.ohlc.high,
low = old_result.price.ohlc.low,
close = old_result.price.ohlc.close,
volume = old_result.volume
))
end
end
# Convert to DataFrame for analysis
ohlc_df = DataFrame(ohlc_bars)
println("Generated $(nrow(ohlc_df)) OHLC bars from $(nrow(tick_data)) ticks")
# Save results
CSV.write("ohlc_1min.csv", ohlc_df)
Multi-timeframe Analysis
Analyze the same data stream across multiple timeframes simultaneously:
# Create resamplers for different timeframes
timeframes = Dict(
"1min" => MarketResampler(Minute(1)),
"5min" => MarketResampler(Minute(5)),
"15min" => MarketResampler(Minute(15)),
"1hour" => MarketResampler(Hour(1))
)
# Generate sample data (simulating 1 hour of minute-level ticks)
base_time = DateTime(2024, 1, 1, 9, 0, 0)
sample_ticks = []
price = 100.0
for i in 1:60 # 60 minutes
# Add some realistic price movement
price += randn() * 0.1 # Random walk
volume = rand(500:1500)
timestamp = base_time + Minute(i)
push!(sample_ticks, MarketDataPoint(timestamp, price, volume))
end
# Process through all timeframes
for tick in sample_ticks
for (name, resampler) in timeframes
fit!(resampler, tick)
end
end
# Display results
println("Multi-timeframe Analysis:")
println("========================")
for (name, resampler) in sort(collect(timeframes))
result = value(resampler)
if result.price.ohlc !== nothing
ohlc = result.price.ohlc
@printf("%-8s: O=%6.2f H=%6.2f L=%6.2f C=%6.2f Vol=%8.0f\\n",
name, ohlc.open, ohlc.high, ohlc.low, ohlc.close, result.volume)
end
end
Performance Optimization
Memory Efficiency
OnlineResamplers uses constant memory regardless of data volume:
# Memory usage stays constant even with millions of data points
memory_test_resampler = MarketResampler(Minute(1))
println("Processing 1 million data points...")
for i in 1:1_000_000
timestamp = DateTime(2024, 1, 1, 9, 0, 0) + Millisecond(i)
data = MarketDataPoint(timestamp, 100.0 + sin(i/1000), 1000.0)
fit!(memory_test_resampler, data)
# Memory usage remains constant due to automatic window transitions
end
result = value(memory_test_resampler)
println("Current window has $(nobs(memory_test_resampler)) observations")
println("Total memory usage is O(1) - constant regardless of data volume processed")
Type Stability
For maximum performance, use concrete types and avoid type instabilities:
# Good: Concrete types enable compiler optimizations
function high_performance_processing(
resampler::MarketResampler{DateTime, Float64, Float64},
data_stream::Vector{MarketDataPoint{DateTime, Float64, Float64}}
)
for data in data_stream
fit!(resampler, data)
end
return value(resampler)
end
# Usage
fast_resampler = MarketResampler{DateTime, Float64, Float64}(Minute(1))
typed_data = MarketDataPoint{DateTime, Float64, Float64}[]
# This will be highly optimized by the Julia compiler
result = high_performance_processing(fast_resampler, typed_data)
Batch Processing
Process data in batches for optimal performance:
function batch_process_ticks(resampler, ticks::Vector)
# Process all ticks without intermediate value() calls
for tick in ticks
fit!(resampler, tick)
end
# Get result only once at the end
return value(resampler)
end
# This approach is faster than calling value() after each fit!()
batch_resampler = MarketResampler(Minute(1))
batch_ticks = [
MarketDataPoint(DateTime(2024, 1, 1, 9, 30, i), 100.0 + randn(), 1000.0)
for i in 1:1000
]
result = batch_process_ticks(batch_resampler, batch_ticks)
Performance Benchmarks
Here are typical performance characteristics:
using BenchmarkTools
# Setup
resampler = MarketResampler(Minute(1))
data = MarketDataPoint(DateTime(2024, 1, 1, 9, 30, 0), 100.0, 1000.0)
# Single operation benchmark
@benchmark fit!($resampler, $data)
# Typical: ~50ns per operation
# Batch processing benchmark
data_batch = [MarketDataPoint(DateTime(2024, 1, 1, 9, 30, i), rand(90:110), rand(500:1500)) for i in 1:10000]
batch_resampler = MarketResampler(Minute(1))
@benchmark begin
for d in $data_batch
fit!($batch_resampler, d)
end
end
# Typical: ~500μs for 10,000 operations (~50ns per operation)
Expected performance characteristics:
- Single operation: ~50 nanoseconds
- Memory usage: O(1) constant
- Throughput: >2 million operations/second on modern hardware
- Memory allocations: Zero in steady state
Integration with OnlineStats
OnlineResamplers seamlessly integrates with the broader OnlineStats ecosystem:
using OnlineStats
# Combine market resampling with other online statistics
combined_stats = Group(
MarketResampler(Minute(1)), # Market data resampling
Mean(), # Overall price mean
Variance(), # Price variance
CountMinSketch(String, 1000) # Frequent symbols (if processing multiple assets)
)
# Generate sample data
data_stream = [
MarketDataPoint(DateTime(2024, 1, 1, 9, 30, i), 100.0 + randn(), 1000.0)
for i in 1:1000
]
# Process all statistics simultaneously
for data in data_stream
# The Group expects a tuple matching all statistics
fit!(combined_stats, (data, data.price, data.price))
end
# Access individual statistics
resampler_result = value(combined_stats[1]) # MarketResampler results
mean_price = value(combined_stats[2]) # Mean price
price_variance = value(combined_stats[3]) # Price variance
println("OHLC: $(resampler_result.price.ohlc)")
println("Mean price: $(mean_price)")
println("Price variance: $(price_variance)")
Custom OnlineStats Integration
You can also create custom statistics that work with market data:
using OnlineStatsBase
# Custom statistic: Price range tracker
mutable struct PriceRange <: OnlineStat{MarketDataPoint}
min_price::Float64
max_price::Float64
n::Int
PriceRange() = new(Inf, -Inf, 0)
end
function OnlineStatsBase._fit!(stat::PriceRange, data::MarketDataPoint)
stat.min_price = min(stat.min_price, data.price)
stat.max_price = max(stat.max_price, data.price)
stat.n += 1
return stat
end
function OnlineStatsBase.value(stat::PriceRange)
return (min=stat.min_price, max=stat.max_price, range=stat.max_price - stat.min_price)
end
OnlineStatsBase.nobs(stat::PriceRange) = stat.n
# Usage
price_range = PriceRange()
market_data = [MarketDataPoint(DateTime(2024, 1, 1, 9, 30, i), 100.0 + randn() * 5, 1000.0) for i in 1:100]
for data in market_data
fit!(price_range, data)
end
range_result = value(price_range)
println("Price range: $(range_result.min) to $(range_result.max)")
println("Total range: $(range_result.range)")
Troubleshooting
Common Issues and Solutions
Type Mismatch Errors
# Problem: Type mismatch
resampler = MarketResampler{DateTime, Float64, Float64}(Minute(1))
bad_data = MarketDataPoint{DateTime, Int64, Float64}(DateTime(2024, 1, 1, 9, 30, 0), 100, 1000.0)
# This will fail:
# fit!(resampler, bad_data) # ERROR: MethodError
# Solution: Ensure consistent types
good_data = MarketDataPoint{DateTime, Float64, Float64}(DateTime(2024, 1, 1, 9, 30, 0), 100.0, 1000.0)
fit!(resampler, good_data) # Works fine
Window Alignment Issues
# Problem: Unexpected window boundaries
resampler = MarketResampler(Minute(1))
# Data that doesn't align with minute boundaries
misaligned_data = MarketDataPoint(DateTime(2024, 1, 1, 9, 30, 37), 100.0, 1000.0)
fit!(resampler, misaligned_data)
result = value(resampler)
println("Window starts at: $(result.window.start_time)") # 2024-01-01T09:30:00
# Solution: Understand that windows are floor-aligned
# The window will start at 9:30:00 even though data arrived at 9:30:37
Memory Issues with Large Datasets
# Problem: Processing very large datasets inefficiently
function inefficient_processing(large_dataset)
results = []
resampler = MarketResampler(Minute(1))
for data in large_dataset
fit!(resampler, data)
push!(results, value(resampler)) # DON'T DO THIS - stores everything
end
return results
end
# Solution: Only store what you need
function efficient_processing(large_dataset)
completed_bars = []
resampler = MarketResampler(Minute(1))
current_window = nothing
for data in large_dataset
old_result = value(resampler)
old_window = old_result.window
fit!(resampler, data)
new_result = value(resampler)
if new_result.window != old_window && old_window !== nothing
# Only store completed bars
push!(completed_bars, (
timestamp = old_window.start_time,
ohlc = old_result.price.ohlc,
volume = old_result.volume
))
end
end
return completed_bars
end
Performance Debugging
If you're experiencing performance issues:
using Profile
function profile_resampling()
resampler = MarketResampler(Minute(1))
data_stream = [MarketDataPoint(DateTime(2024, 1, 1, 9, 30, i), 100.0, 1000.0) for i in 1:100000]
@profile begin
for data in data_stream
fit!(resampler, data)
end
end
end
profile_resampling()
Profile.print() # Analyze where time is spent
Validation and Testing
Always validate your results:
function validate_ohlc(ohlc::OHLC)
@assert ohlc.high >= ohlc.open "High should be >= Open"
@assert ohlc.high >= ohlc.close "High should be >= Close"
@assert ohlc.low <= ohlc.open "Low should be <= Open"
@assert ohlc.low <= ohlc.close "Low should be <= Close"
@assert ohlc.high >= ohlc.low "High should be >= Low"
end
# Use in your processing pipeline
resampler = MarketResampler(Minute(1))
# ... process data ...
result = value(resampler)
if result.price.ohlc !== nothing
validate_ohlc(result.price.ohlc)
println("OHLC validation passed ✓")
end
This user guide covers the essential aspects of using OnlineResamplers.jl effectively. For more detailed API information, see the API Reference, and for step-by-step learning, check out the Tutorial.