The human eye has capabilites that you'd need a camera more expensive than a LiDAR by a good bit too replicate. No rolling shutter, very low delay, continuous signal processing instead of per-frame, maximum resolution around ~70MP eq. in the center, f/2.8 aperture at a full-frame image size, servo driven active rangefinding, continuous cleaning, etc...
A camera with those features would be around 3000-15000$, and you'd need two.
Also, the human brain can use focus information and stereo phase to deduce 3D structure, which is another ace up its sleeve.
It might very well be the case that reproducing this is more expensive in training, processing and material than LiDAR. In fact, I'd say it's very likely - we're talking loops that need to update hundreds of times a second and data rates in the gigabits, which our brains deal with by doing a lot of processing inside the eye and along the way.
This may be the case, you might not have to reproduce everything.
But just to give an example that is very relevant to self-driving, getting a camera with similar low-light video performance as the human eye costs over 2000$.