In the first part of our three-part blog series on rFpro’s AV elevate™ platform, we explored how simulation is transforming the tuning of sensor systems for autonomous vehicles. In this post, we turn our focus to the next major step in AV development; training the perception system. We spoke with Josh Wreford, AV elevate Product Manager at rFpro, to learn how simulation can help train machine learning algorithms with the variety and precision they need to perform reliably in the real world.
Q: To begin with, what do we mean by training a perception system?
Training a perception system is all about teaching it to understand and interpret the world around it. It’s very similar to how children learn. You wouldn’t just show a child one dog and expect them to recognise all dogs. You expose them to lots of different examples, and over time they learn the shared characteristics.
We’re doing the exact same thing for machine learning. We present the system with a wide range of inputs, all labelled with the correct answers, so it can identify the patterns that matter. The goal is for the algorithm to understand what something is, regardless of the conditions or context it appears in.
Q: What exactly does the system need to perceive?
It needs to recognise not just physical objects like pedestrians, cyclists, and vehicles, but also ‘concepts’ such as road edges, lane boundaries and pavements, and not just identify them but, crucially, understand what they mean. We’re not just labelling shapes; we’re teaching functional understanding. For example, two areas of tarmac might look similar, but one is the vehicle’s lane and the other is for oncoming traffic, that difference is safety critical.
We call this functional segmentation. It’s the bridge between perception and planning. You’re not just building a visual map, you’re understanding how the vehicle can behave in that free space. That’s what enables safe and logical decision-making.
Q: Why is variety so important when training perception systems?
If we go back to the teaching children about dogs analogy, if the child has only ever seen a black dog then it will assume that colour is a characteristic of what makes a dog a dog. The only way to teach the machine learning algorithm that isn’t the case is to show it the widest variety of dogs possible. This issue of not showing enough variation is called overfitting.
This issue also occurs at a much more granular level. If the synthetic data fed to an algorithm only shows new painted road markings or clean road signs, for example, then the model might fail the moment anything deviates from that. In this case, the system becomes too reliant on narrow patterns in the data and struggles to generalise. By exposing the perception stack to a wide range of inputs, like dirty signs, faded paint markings, we build better confidence in the system. The aim is to train across the widest coverage possible, so the system continues to give confident outputs even when conditions are less than ideal.
Q: How does AV elevate help introduce that variety?
Simulation gives us an incredible amount of control. With AV elevate, we can vary almost every parameter. The lighting, weather, time of day, time of year, object appearances, pedestrian models and vehicle types and colour can all be varied creating a huge array of variation.
In the early days, our vehicle colours used to be limited to just five options. It just wasn’t necessary for vehicle dynamics development. Now we have AV elevate and are working with major OEMs to develop ADAS and AD systems, we allow full RGB colour control of vehicles. That’s over 16 million colour options. We’ve also introduced features like dirt masks, so cars and road surfaces can appear weathered or grimy. It’s all about creating a data-rich environment that challenges the perception system in every way.
Q: Is the same level of variation applied to the sensor models themselves?
Yes, and it’s a critical part of realistic training. Real-world sensors aren’t perfect. Camera lenses will vary from camera to camera, they will degrade over time, or just get dirty. AV elevate allows you to simulate this, introducing image blur and lens aberrations to the synthetic data.
You’re not just training for ideal hardware. You’re preparing the system for the inevitable imperfections that come with real-world usage, right through the life of the vehicle.
Q: Can simulation truly replace real-world training data?
Synthetic data isn’t a complete replacement, correlation will always be a critical step in the development process, but it’s now a vital part of the mix. There are some things you simply can’t do with real-world data that you can with synthetic. One of simulation’s biggest advantages is access to ground truth. In AV elevate, we know the exact 3D position and classification of every object. That’s incredibly powerful for training, because it means you can generate perfectly labelled data with 100% accuracy.
For example, this enables us to provide panoptic segmentation for camera and object categorisation for each radar return, or every point in the LiDAR point cloud.
Q: How is training data generated in AV elevate?
There are two key ways to generate training data; specific scenario and diverse variation. Specific scenario data generation allows the user to focus development. For example, the scenario might be a car cutting in ahead of the test vehicle. Once the base scenario has been developed Scenario Editor allows a large number of variations to be introduced, such as how quickly the test car is travelling, the cut-in distance, or weather and time of day.
With diverse variation data the aim is to maximise the value of data generation by changing aspects such as object types, positions, behaviours and environmental conditions for each frame. This is a less focussed approach but provides much broader coverage. Combining the two methods intelligently ensures comprehensive model training for optimal performance in real-world applications.
AV elevate is built on rFpro’s high-fidelity simulation platform and leverages its ray tracing rendering technology. This means every frame of training data includes realistic effects like motion blur and rolling shutter effect. So what the sensor ‘sees’ is as close as possible to what it would see in the real world.
Q: Can synthetic data be tailored for different perception applications?
Absolutely. The best training data is always application-specific. If you’re working on human detection systems, you want lots of different human forms, poses, clothing, and movement types. If it’s navigational planning, then it’s more about varied road layouts, junctions and lane markings. We have a library of more than 180 digital twins of real-world locations. This means you can subject your machine learning algorithm to the busy streets of LA, the high-speed autobahns of Germany or the undulating country roads of the UK, all from your office.
Q: Where does perception training fit within the wider AV development process?
It sits right in the middle. First, you tune your sensors to capture the best data possible. Then you train your perception system using that data, ideally in a wide range of scenarios with lots of variation. Once that’s complete, you move on to testing the entire technology stack under realistic and challenging conditions.