I'd reccommend running the experiment with each roster set at least 20 times. Then average out your results for better comparison before choosing the set you want to use. Running the experiment only once per roster set is like giving Hank Aaron, Mario Mendoza and Dave Kingman only one at-bat to decide who's the best hitter.
Just my $.02