Modeling & SimulationCompleted2026

GPU-Accelerated Synthetic Data Generation for Crop Phenotyping

Physics-based sensor simulation using C++ ray-tracing to generate perfectly labeled training data. Dual virtual sensors (LiDAR + multispectral camera) provide structural point clouds and pixel-perfect segmentation masks for bean-wheat intercropping ML research.

GPU-Accelerated Synthetic Data Generation for Crop Phenotyping

Gallery

Problem

Field annotation and manual labeling for machine learning are prohibitively expensive and time-consuming. Overlapping canopies in intercropping systems create ambiguous segmentation boundaries. Real-world datasets lack ground-truth labels for individual plant structures, limiting supervised learning approaches for precision phenotyping.

Approach

Built fully controllable synthetic 3D environments of bean and wheat intercropping plots using the Helios C++ framework. Defined plot parameters (plant density, row spacing, growth stages) and generated plant positions dynamically using 3D geometric primitives. Rendered scenes with realistic solar positioning, directional lighting, and ray-traced shadows. Simulated two virtual sensors: (1) Dual LiDAR scanners emitting 1.536M rays via forward ray-tracing to generate structural 3D point clouds, and (2) Multispectral nadir-view camera using backward ray-tracing to capture RGB, NIR, and Red-Edge spectral bands with automatic pixel-perfect segmentation masks for every plant instance.

Data

Synthetic intercropping scenes with parameterized plant positioning (row spacing: 12-30 cm, density: 20-60 plants/m²) and temporal growth progression (0-60 days after emergence). Output datasets include georeferenced LiDAR point clouds (.las format), multispectral orthomosaics (3-5 bands, 1 mm/pixel resolution), and COCO-format instance segmentation JSON with per-plant polygon annotations. All labels generated automatically during rendering—zero manual annotation cost.

Validation

Qualitative visual comparison between synthetic scenes and real field imagery from Campus Klein-Altendorf PhenoRoam 2022 dataset. Spectral reflectance values for NIR and Red-Edge bands validated against published crop physiology literature to ensure biological plausibility. Identified calibration drift: initial simulations over-represented broadleaf bean coverage (65% vs. 35% wheat) compared to observed real-world dominance of wheat tillers, requiring sowing density re-calibration to match field emergence patterns.

Results

Synthetic data pipeline provides unlimited, perfectly labeled training datasets at zero marginal annotation cost. LiDAR simulation with 1.536M ray budget achieves sub-centimeter structural accuracy for canopy height models and plant segmentation. Multispectral camera simulation delivers pixel-perfect instance masks for overlapping canopies—a task nearly impossible with manual annotation. Critical trade-off identified: GPU ray-tracing requires significant computational resources (NVIDIA RTX-class hardware), and sim-to-real domain gap necessitates careful parameter calibration to match real-world emergence rates and spectral signatures.

My Role

Simulation Engineer. Configured Helios scene parameters, implemented sensor placement logic, executed GPU-accelerated ray-tracing simulations, analyzed spectral output for biological validity, and calibrated sowing density parameters to correct bean over-representation bias.

Next Steps

Integrate domain randomization (varied lighting conditions, soil backgrounds) to improve sim-to-real generalization. Quantify synthetic-to-real transfer learning performance on downstream segmentation tasks. Explore neural radiance fields (NeRF) as alternative to geometric primitives for more realistic canopy microstructure. Publish synthetic dataset as benchmark for intercropping ML research.

Key Outcomes

  • Unlimited perfectly-labeled training data (zero annotation cost)
  • 1.536M-ray LiDAR point clouds for 3D structure
  • Pixel-perfect multispectral segmentation masks (RGB/NIR/Red-Edge)
  • Identified and corrected bean over-representation calibration issue
  • Validated spectral realism against crop physiology literature

Tech Stack

C++PythonHeliosCUDAOptiXRay-tracing

Tags

ray-tracingsensor-simulationsynthetic-datagpulidarmultispectral