ADVERTISEMENT

WrestleStat Week 4 Rankings

andegre

HB Heisman
May 18, 2004
5,436
3,152
113
45
Urbandale, IA
Love the site, andegre.

Would love to see the algorithm tweaked to more heavily weigh recent results over volume of matches. You have, for example, Murin at #106 despite a win over #54 Happel and 0 career losses. I understand that the algorithm hedges the higher ranking until volume of matches increases. But in doing so it paints a false picture of where he likely fits rankings-wise. If one were to rank him manually one would put him around #50--still hedging while he builds his match volume but giving him credit for the win.

Same issue with Warner at #27 despite being 8-0 with a win over #3. Common sense would put him in the top 10 (if not top 5)--he might still be below Miklus given his limited body of work, but #27 indicates an algorithm setting that is too conservative in weighing that win.

The reason it's important is that one needs to be able to see "hot newcomers" accurately reflected in the rankings, i.e. redshirt and true freshmen. But in fact it also carries over a bit to sophomores like Wilcke, who is obviously under-ranked at #44 following an R12 showing at NCAAs and a 4-0 start this season.

My two cents, but I think most would agree, probably including you!

Thanks again for providing the service. I know it's a lot of work to maintain and we all appreciate it.
 
Love the site, andegre.

Would love to see the algorithm tweaked to more heavily weigh recent results over volume of matches. You have, for example, Murin at #106 despite a win over #54 Happel and 0 career losses. I understand that the algorithm hedges the higher ranking until volume of matches increases. But in doing so it paints a false picture of where he likely fits rankings-wise. If one were to rank him manually one would put him around #50--still hedging while he builds his match volume but giving him credit for the win.

Same issue with Warner at #27 despite being 8-0 with a win over #3. Common sense would put him in the top 10 (if not top 5)--he might still be below Miklus given his limited body of work, but #27 indicates an algorithm setting that is too conservative in weighing that win.

The reason it's important is that one needs to be able to see "hot newcomers" accurately reflected in the rankings, i.e. redshirt and true freshmen. But in fact it also carries over a bit to sophomores like Wilcke, who is obviously under-ranked at #44 following an R12 showing at NCAAs and a 4-0 start this season.

My two cents, but I think most would agree, probably including you!

Thanks again for providing the service. I know it's a lot of work to maintain and we all appreciate it.
Agree 100%. I've also found a couple examples where a newcomer is something like 3 - 2, yet have already moved to top 33 (noticed when glancing over all weights), because they had 1 good win [not sure about the losses], but with 2 losses already, that wrestler should NOT move up that fast already...I'll get my algorithm guy on this to have him investigate further.

Thank you!
 
Agree 100%. I've also found a couple examples where a newcomer is something like 3 - 2, yet have already moved to top 33 (noticed when glancing over all weights), because they had 1 good win [not sure about the losses], but with 2 losses already, that wrestler should NOT move up that fast already...I'll get my algorithm guy on this to have him investigate further.

Thank you!

I think everybody should have their own "Algorithm Guy"
 
Love the site, andegre.

Would love to see the algorithm tweaked to more heavily weigh recent results over volume of matches. You have, for example, Murin at #106 despite a win over #54 Happel and 0 career losses. I understand that the algorithm hedges the higher ranking until volume of matches increases. But in doing so it paints a false picture of where he likely fits rankings-wise. If one were to rank him manually one would put him around #50--still hedging while he builds his match volume but giving him credit for the win.

Same issue with Warner at #27 despite being 8-0 with a win over #3. Common sense would put him in the top 10 (if not top 5)--he might still be below Miklus given his limited body of work, but #27 indicates an algorithm setting that is too conservative in weighing that win.

The reason it's important is that one needs to be able to see "hot newcomers" accurately reflected in the rankings, i.e. redshirt and true freshmen. But in fact it also carries over a bit to sophomores like Wilcke, who is obviously under-ranked at #44 following an R12 showing at NCAAs and a 4-0 start this season.

My two cents, but I think most would agree, probably including you!

Thanks again for providing the service. I know it's a lot of work to maintain and we all appreciate it.
Agreed
 
LOL....well, I definitely need to get one now

"Yeah man, I got your Support Vector Machines, I got your Random Forests.....I got some a them Logistic Regressions. Man, I got all that good shizzle. You want some of them new Convolutional Neural Nets? That sh*t is da Bomb!!"

Dave-Chappelle.jpg
 
"Yeah man, I got your Support Vector Machines, I got your Random Forests.....I got some a them Logistic Regressions. Man, I got all that good shizzle. You want some of them new Convolutional Neural Nets? That sh*t is da Bomb!!"

Dave-Chappelle.jpg

I think someone is gonna wanna know what "shizzle" means..........

Just saying............
hippie7.gif
 
"Yeah man, I got your Support Vector Machines, I got your Random Forests.....I got some a them Logistic Regressions. Man, I got all that good shizzle. You want some of them new Convolutional Neural Nets? That sh*t is da Bomb!!"

Dave-Chappelle.jpg
Perhaps Recurrent Neural Networks or Long Term Short Term NN's. Need to consider the time series aspect of the model. But the problem remains as to what the true y_pred values are. One could train using previous data for each year and use that year's NCAA tournament result as the true model prediction value. Then run the model on the current year to predict upcoming NCAA results for each weight class. This would be easy to code up if access to the feature data is available (wrestlestat is already using some of this data for its own model). Feature engineering would be important but there are many methods to aid in this. Cross validation and holdout sets for training and testing should not be a problem if enough data can be obtained. Good luck.
 
Last edited:
Perhaps Recurrent Neural Networks or Long Term Short Term NN's. Need to consider the time series aspect of the model. But the problem remains as to what the true y_pred values are. One could train using previous data for each year and use that year's NCAA tournament result as the true model prediction value. Then run the model on the current year to predict upcoming NCAA results for each weight class. This would be easy to code up if access to the feature data is available (wrestlestat is already using some of this data for its own model). Feature engineering would be important but there are many methods to aid in this. Cross validation and holdout sets for training and testing should not be a problem if enough data can be obtained. Good luck.
Wow -- I think I'm a little smarter for having read that, but. . . was that English? It looked familiar but nothing really registered in my brain. . . . I'm impressed. . . I think!
 
  • Like
Reactions: Dagboi
Perhaps Recurrent Neural Networks or Long Term Short Term NN's. Need to consider the time series aspect of the model. But the problem remains as to what the true y_pred values are. One could train using previous data for each year and use that year's NCAA tournament result as the true model prediction value. Then run the model on the current year to predict upcoming NCAA results for each weight class. This would be easy to code up if access to the feature data is available (wrestlestat is already using some of this data for its own model). Feature engineering would be important but there are many methods to aid in this. Cross validation and holdout sets for training and testing should not be a problem if enough data can be obtained. Good luck.

 
Perhaps Recurrent Neural Networks or Long Term Short Term NN's. Need to consider the time series aspect of the model. But the problem remains as to what the true y_pred values are. One could train using previous data for each year and use that year's NCAA tournament result as the true model prediction value. Then run the model on the current year to predict upcoming NCAA results for each weight class. This would be easy to code up if access to the feature data is available (wrestlestat is already using some of this data for its own model). Feature engineering would be important but there are many methods to aid in this. Cross validation and holdout sets for training and testing should not be a problem if enough data can be obtained. Good luck.
To simplify, the current Wrestlestat rankings are the result of a hand estimated model with no training. I see that they are now planning to look at NCAA results and possibly tweak their model so that their predictions are in better agreement. The point is that there are better ways to do this in an automated manner which are far more accurate. There are lots of people doing this for NCAA basketball for example. Their tournament provides some great betting opportunities if you have an edge. No reason that this cannot be extended to wrestling. Different features, but similar models.
 
  • Like
Reactions: WWDMHawkeye
Perhaps Recurrent Neural Networks or Long Term Short Term NN's. Need to consider the time series aspect of the model. But the problem remains as to what the true y_pred values are. One could train using previous data for each year and use that year's NCAA tournament result as the true model prediction value. Then run the model on the current year to predict upcoming NCAA results for each weight class. This would be easy to code up if access to the feature data is available (wrestlestat is already using some of this data for its own model). Feature engineering would be important but there are many methods to aid in this. Cross validation and holdout sets for training and testing should not be a problem if enough data can be obtained. Good luck.

Oh, I could feature engineer the pants off of this prediction. Problem is, all the data I’d want to dump into the models is too closely guarded by the coaches. Who’s racking up points in practice, who’s maintaining weight with a steady rhythm, how each wrestler is ranked by their peers and teammates, timestamps from hours spent in the weight room, cardio, and watching film, number of tweets posted, hours of sleep per night, junk food consumed, alcohol consumed, mental strength survey data, coaches’ film breakdown/grades......dump it all in a rolling Bayes net or gradient boosted machine and let God sort it out.

And then it would all be completely up-ended in March by who has the heart of a champion. :)
 
  • Like
Reactions: Grip220
The beauty of the wrestlebacks system employed at NCAA’s is that it makes the final outcome measure of the tournament results slightly more robust—that second chance to climb back up the ladder should lead to a more predictable set of outcomes than a single-elimination tournament. Take last year’s results as your y_pred values for score prediction (or heck, the last five years’ data), then develop a model to use for this year based on a kitchen sink of features and how much they each contributed to score prediction (or the reduction of score prediction error) in previous years.

I would avoid regression-based approaches to the algorithm, because they don’t play nicely with high-dimensional data. The key, to me, would be to take a Big Data approach, gathering a giant swath of relevant feature data from hundreds of variables. The contributing factors are in there somewhere, if we could just get the tracking and access to start looking for them.

But that’s kinda creepy, and takes all the f*** out of it. :)
 
The beauty of the wrestlebacks system employed at NCAA’s is that it makes the final outcome measure of the tournament results slightly more robust—that second chance to climb back up the ladder should lead to a more predictable set of outcomes than a single-elimination tournament. Take last year’s results as your y_pred values for score prediction (or heck, the last five years’ data), then develop a model to use for this year based on a kitchen sink of features and how much they each contributed to score prediction (or the reduction of score prediction error) in previous years.

I would avoid regression-based approaches to the algorithm, because they don’t play nicely with high-dimensional data. The key, to me, would be to take a Big Data approach, gathering a giant swath of relevant feature data from hundreds of variables. The contributing factors are in there somewhere, if we could just get the tracking and access to start looking for them.

But that’s kinda creepy, and takes all the f*** out of it. :)
Whether you use Gradient Boosting, Random Forests or even an ensemble model approach will determine whether you get that last 0.1% of accuracy. The importance is getting the features and then modeling against true target values (the NCAA tournament). Most of these features are readily available from past matches. Most sports are now using machine learning in at least some aspect. The current ranking methods in wrestling are relatively stone age in comparison. Someone should make this a Kaggle competition and put up a small prize. You would have hundreds of the best data scientists competing to develop the best model. As I said, they already have such models for the NCAA basketball tournament.
 
ADVERTISEMENT
ADVERTISEMENT