Primary intended use: Predict residential HVAC component failures 7 to 21 days before breakdown, based on vibration telemetry and energy consumption patterns. The model outputs a failure category, a confidence score (0.0 to 1.0), and a recommended action for the homeowner or service technician.
Primary intended users: AURRA platform integrators (via the REST API), HVAC service providers receiving automated dispatch alerts, and homeowners receiving maintenance recommendations through the AURRA mobile app.
Downstream applications: Alert dispatch automation (triggered at the platform default confidence threshold of 0.72), technician scheduling systems, and energy efficiency reporting dashboards.
The model uses a two-stage architecture. The first stage is a feature engineering pipeline that transforms raw sensor telemetry into time-series features: rolling window statistics (mean, standard deviation, peak-to-peak) over 1-hour, 6-hour, and 24-hour windows for both vibration and energy inputs, plus derived features including vibration trend slope, energy consumption rate-of-change, and duty cycle ratio.
The second stage is an XGBoost multi-class classifier (5 failure categories + 1 healthy class) trained on the engineered feature set. The ensemble uses 500 estimators with a maximum depth of 8, trained with a learning rate of 0.05 and early stopping at 50 rounds. Class weights are balanced to account for the natural imbalance between healthy readings and failure events.
Confidence scores are derived from the model's softmax probability for the predicted failure class. The platform default alert threshold of 0.72 was calibrated via cost-sensitive analysis where false negatives (missed failures, avg. cost $850 USD including emergency repair and secondary damage) were weighted 3.2x higher than false positives ($265 USD per unnecessary truck roll). This weighting optimizes the precision-recall trade-off for residential maintenance scheduling.
/alerts endpoint's confidence parameter. Lowering the threshold increases recall (fewer missed failures) at the cost of precision (more false alerts). See Section 06 for threshold-specific metrics.Feature importance: SHAP (SHapley Additive exPlanations) analysis of mean absolute contribution identifies current_amps (24%), amplitude_24h_mean (18%), and duty_cycle_ratio (15%) as primary failure predictors. Voltage deviation contributes 8% but shows high interaction effects with climate zone 5A (cold-weather electrical stress). Full SHAP visualizations are available via the model registry.
Geographic distribution: Training data covers ASHRAE climate zones 2A through 6A, with highest density in zones 3A (Southeast), 4A (Mid-Atlantic), and 5A (Upper Midwest). Zones 7 and 8 (subarctic) are not represented.
HVAC system types: Split-system central air (62%), heat pump (24%), packaged unit (11%), ductless mini-split (3%). Window units and portable systems are excluded.
docs.aurrahome.com/ml/changelog.The model ingests two telemetry streams from the AURRA sensor array. Raw readings are transformed into 23 engineered features before classification.
Derived features (23 total): Rolling window statistics (1h, 6h, 24h) for amplitude, frequency, and current draw. Vibration trend slope over 72-hour windows. Energy consumption rate-of-change. Duty cycle ratio (runtime / elapsed time). Voltage deviation from 30-day rolling mean. Amplitude-frequency interaction term. Current-to-runtime efficiency ratio.
Metrics evaluated on the held-out test set (most recent 3 months, 2,764 confirmed failure events). All metrics reported at the platform default alert threshold of 0.72 unless noted.
| Threshold | Precision | Recall | F1 | FP Rate | Use Case |
|---|---|---|---|---|---|
| 0.50 | 0.78 | 0.93 | 0.85 | 8.2% | High-recall: minimize missed failures |
| 0.60 | 0.84 | 0.90 | 0.87 | 5.6% | Balanced: warranty programs |
| 0.72 ◂ | 0.91 | 0.84 | 0.87 | 3.1% | Platform default |
| 0.85 | 0.95 | 0.72 | 0.82 | 1.4% | High-precision: minimize false alerts |
| 0.95 | 0.98 | 0.51 | 0.67 | 0.4% | Conservative: high-certainty only |
Performance by failure category at threshold 0.72. The model performs best on compressor and capacitor failures, which produce the most distinctive telemetry signatures. Refrigerant leak detection has the lowest recall due to subtler signal patterns.
| Failure Category | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| Compressor bearing degradation | 0.94 | 0.89 | 0.91 | 842 |
| Capacitor degradation | 0.93 | 0.88 | 0.90 | 614 |
| Blower motor anomaly | 0.90 | 0.83 | 0.86 | 521 |
| Heat exchanger fouling | 0.88 | 0.80 | 0.84 | 448 |
| Refrigerant leak indicators | 0.85 | 0.74 | 0.79 | 339 |
| System Type | F1 | Training % | Note |
|---|---|---|---|
| Split-system central air | 0.89 | 62% | Highest representation, strongest signal |
| Heat pump | 0.86 | 24% | Dual-mode operation widens the feature space |
| Packaged unit | 0.83 | 11% | Lower representation limits generalization |
| Ductless mini-split | 0.76 | 3% | Underrepresented; use with caution |
Geothermal systems: Zero training data for geothermal heat pump configurations. Predictions for geothermal units will be inaccurate. The API does not reject requests from geothermal devices, so integrators must filter at the application layer.
Climate zone gaps: ASHRAE zones 7 and 8 (subarctic, arctic) are absent from training data. Units in these zones may exhibit cold-weather patterns the model has not learned. Version 1.2 extended coverage through zone 6A but not beyond.
Refrigerant leak detection: Lowest recall (0.74) because leaks produce gradual efficiency degradation rather than distinct vibration anomalies. Detection relies on energy consumption rate-of-change, which seasonal temperature shifts can confound. Supplement with direct pressure sensor data where available.
Cold-start period: Minimum 72 hours of telemetry required from a newly installed device before predictions reach rated accuracy. During cold-start, rolling window features lack sufficient history. Suppress predictions or flag as low-confidence.
Sensor calibration drift: MEMS accelerometers can drift in high-vibration environments. The model does not self-correct for drift. Implement annual calibration checks and flag devices where amplitude baseline has shifted more than 15% from installation readings.
Demographic fairness: This model operates on mechanical sensor data only. No demographic, household income, geographic, or personally identifiable information is used as a model input or training feature. The model excludes demographic features by design, preventing disparate impact based on homeowner characteristics.
Geographic equity concern: Training data concentrates in ASHRAE zones 2A through 6A, with higher density in suburban installations. Rural and extreme-climate homes are underrepresented, creating a coverage gap where those homeowners may receive less reliable predictions. The quarterly retraining cycle addresses this progressively as the fleet expands.
Economic impact of false predictions: False positives trigger unnecessary service calls with direct cost to homeowners. False negatives cause unexpected breakdowns. The 0.72 threshold was calibrated with input from the AURRA Customer Advisory Board to balance these costs for median-income residential users. Integrators serving cost-sensitive populations may raise the threshold to reduce false-positive expenses.
Automation bias risk: Homeowners and technicians may over-rely on predictions and skip manual inspections. All alert messaging includes: "This is a predictive advisory. A qualified HVAC technician should confirm this finding before repair work begins." Integrators must preserve this advisory framing.
Data retention: Raw telemetry retained 24 months for retraining, then aggregated and anonymized. Homeowners can request deletion via the AURRA Privacy Center. Deleted data is excluded from future training within 30 days.
Threshold selection: Start with 0.72. Monitor false positive rates for your user population over the first 90 days and adjust. Warranty programs may benefit from 0.60 to maximize recall. Automated dispatch should use 0.85 or higher.
Cold-start handling: Suppress predictions for the first 72 hours after device installation. Display a "Calibrating" state rather than showing low-confidence predictions that could erode trust.
Mini-split deployments: If your fleet exceeds 20% ductless mini-splits, contact AURRA ML Engineering for custom threshold calibration or a two-model ensemble strategy. The default threshold may produce unacceptable false positive rates.
Alert UX requirements: Always display confidence scores alongside predictions. Always include advisory language. Never present predictions as diagnoses. Use "AURRA detected a pattern consistent with [issue]" rather than "Your [component] is failing."
Feedback loop: Submit confirmed outcomes via POST /diagnostics/feedback. This data directly improves future versions. Participating integrators receive early access to model updates.
v1.1 (January 2026): Expanded training set from 9,800 to 14,200 devices. Improved ductless mini-split F1 from 0.68 to 0.76.
v1.0 (October 2025): Initial production release. 5 failure categories. XGBoost architecture.
About This Documentation
This model card follows emerging ML documentation standards (Google Model Card Toolkit, NeurIPS reproducibility checklists), adapted for HVAC predictive maintenance. I developed the schema to balance technical transparency for integrators with actionable risk disclosures for downstream decision-makers.