Hey y’all, I had too much time on my hands so I tried making an ML model of predicting SNA selection this afternoon.
It currently predicts Y/N with a 93% accuracy and I can try your scores through it if you want.
A couple of huge caveats here with the data:
- The quality of training data is pretty shit. I could only find 3ish spreadsheets easily on the AW forums, and was too lazy to go and find others.
- The “waivers” column seems to be a relatively recent addition, so it doesn’t really exist in the previous data. That said, accuracy didn’t improve when I discarded
- There is a pretty big bias towards “yes” because only the die-hards pilot wannabes post on this forum/spreadsheets and those that do are generally pretty likely to get yes’s. There is a surprising lack of prorec-N’s in the sheets I looked at.
- The selection yes/no is purely based off of SNA applicants. I didn’t include NFO because that was too much work and it seems you’re pretty likely to get in if you select NFO as your primary.
- I only had 150ish rows of data from the sheets I found, so I duplicated it on the assumption that the spread of applicant stats is relatively similar across boards, in order to get more training data. So, the existing biases are doubled.
- Didn’t include college major b/c I didnt' want to deal with it.
- This is strictly from the spreadsheet data, so it’s not reflecting the “whole person” concept with stuff that can’t be quantified. <- biggest caveat
So with those rather large caveats out of the way, here are some observations:
- Age IS unsurprising a big factor. Curiously, there’s a dip between 27-29, but I think that’s mostly a dataset issue.
- PFAR is the next best predictor
- OAR and GPA are weighted pretty similarly, as is AQR
- Interestingly, the FOFAR is a better predictor than Flight experience - meaning it can be said with pretty good confidence that flight exp doesn’t really matter.
- Prior service and sex are also negligible.
Next steps: if any of you guys and gals have extra time, it would be interesting to build out a better dataset together. That means finding the past board spreadsheets, cleaning it, and putting it into this model to train it better. That would yield better results, and I might be able to make a webapp so ppl can look up their own scores.
Using the model, I tried predicting my own prob of prorec-Y and got a Yes with 96%. LMK if you want me to try putting your scores into it and seeing what comes out. Caveats blah blah blah.
The model itself:
View attachment 34200
Trying my own scores
View attachment 34201