Score a detector the way safety actually depends on it
You lock the operating definition of a detection (a predicted box matched to a ground-truth box at IoU 0.5, an unmatched real object is a false negative), then compute precision, recall, and the false-negative rate on vulnerable road users from a real open-AV detections-vs-ground-truth table in Python. You leave knowing why recall on pedestrians and cyclists, not a headline accuracy number, is the figure that decides whether a person is safe near the car.