WrestleStat Week 4 Rankings

andegre · Nov 28, 2017

Wrestler Rankings: https://www.wrestlestat.com/rankings/wrestler

Dual Rankings: https://www.wrestlestat.com/rankings/dual

Tournament Rankings: https://www.wrestlestat.com/rankings/tournament

Don't forget, with the new feature released yesterday, the Tournament Projection Tool, you can now essentially create your own rankings, and share with everyone.

Here's a link to the Tournament Projection Tool page: https://www.wrestlestat.com/tourney/projection

andegre · Nov 28, 2017

@TheRealPD2 you're welcome!

TarpHawk · Nov 28, 2017

Love the site, andegre.

Would love to see the algorithm tweaked to more heavily weigh recent results over volume of matches. You have, for example, Murin at #106 despite a win over #54 Happel and 0 career losses. I understand that the algorithm hedges the higher ranking until volume of matches increases. But in doing so it paints a false picture of where he likely fits rankings-wise. If one were to rank him manually one would put him around #50--still hedging while he builds his match volume but giving him credit for the win.

Same issue with Warner at #27 despite being 8-0 with a win over #3. Common sense would put him in the top 10 (if not top 5)--he might still be below Miklus given his limited body of work, but #27 indicates an algorithm setting that is too conservative in weighing that win.

The reason it's important is that one needs to be able to see "hot newcomers" accurately reflected in the rankings, i.e. redshirt and true freshmen. But in fact it also carries over a bit to sophomores like Wilcke, who is obviously under-ranked at #44 following an R12 showing at NCAAs and a 4-0 start this season.

My two cents, but I think most would agree, probably including you!

Thanks again for providing the service. I know it's a lot of work to maintain and we all appreciate it.

andegre · Nov 28, 2017

TarpHawk said:
Love the site, andegre.

Would love to see the algorithm tweaked to more heavily weigh recent results over volume of matches. You have, for example, Murin at #106 despite a win over #54 Happel and 0 career losses. I understand that the algorithm hedges the higher ranking until volume of matches increases. But in doing so it paints a false picture of where he likely fits rankings-wise. If one were to rank him manually one would put him around #50--still hedging while he builds his match volume but giving him credit for the win.

Same issue with Warner at #27 despite being 8-0 with a win over #3. Common sense would put him in the top 10 (if not top 5)--he might still be below Miklus given his limited body of work, but #27 indicates an algorithm setting that is too conservative in weighing that win.

The reason it's important is that one needs to be able to see "hot newcomers" accurately reflected in the rankings, i.e. redshirt and true freshmen. But in fact it also carries over a bit to sophomores like Wilcke, who is obviously under-ranked at #44 following an R12 showing at NCAAs and a 4-0 start this season.

My two cents, but I think most would agree, probably including you!

Thanks again for providing the service. I know it's a lot of work to maintain and we all appreciate it.

Agree 100%. I've also found a couple examples where a newcomer is something like 3 - 2, yet have already moved to top 33 (noticed when glancing over all weights), because they had 1 good win [not sure about the losses], but with 2 losses already, that wrestler should NOT move up that fast already...I'll get my algorithm guy on this to have him investigate further.

Thank you!

Needanap · Nov 28, 2017

andegre said:
Agree 100%. I've also found a couple examples where a newcomer is something like 3 - 2, yet have already moved to top 33 (noticed when glancing over all weights), because they had 1 good win [not sure about the losses], but with 2 losses already, that wrestler should NOT move up that fast already...I'll get my algorithm guy on this to have him investigate further.

Thank you!

I think everybody should have their own "Algorithm Guy"

andegre · Nov 28, 2017

Needanap said:
I think everybody should have their own "Algorithm Guy"

I guess I just figured everyone already did...

Needanap · Nov 28, 2017

andegre said:
I guess I just figured everyone already did...

LOL....well, I definitely need to get one now

Tcalumet · Nov 28, 2017

TarpHawk said:
Love the site, andegre.

Would love to see the algorithm tweaked to more heavily weigh recent results over volume of matches. You have, for example, Murin at #106 despite a win over #54 Happel and 0 career losses. I understand that the algorithm hedges the higher ranking until volume of matches increases. But in doing so it paints a false picture of where he likely fits rankings-wise. If one were to rank him manually one would put him around #50--still hedging while he builds his match volume but giving him credit for the win.

Same issue with Warner at #27 despite being 8-0 with a win over #3. Common sense would put him in the top 10 (if not top 5)--he might still be below Miklus given his limited body of work, but #27 indicates an algorithm setting that is too conservative in weighing that win.

The reason it's important is that one needs to be able to see "hot newcomers" accurately reflected in the rankings, i.e. redshirt and true freshmen. But in fact it also carries over a bit to sophomores like Wilcke, who is obviously under-ranked at #44 following an R12 showing at NCAAs and a 4-0 start this season.

My two cents, but I think most would agree, probably including you!

Thanks again for providing the service. I know it's a lot of work to maintain and we all appreciate it.

Agreed

Dagboi · Nov 28, 2017

Needanap said:
LOL....well, I definitely need to get one now

"Yeah man, I got your Support Vector Machines, I got your Random Forests.....I got some a them Logistic Regressions. Man, I got all that good shizzle. You want some of them new Convolutional Neural Nets? That sh*t is da Bomb!!"

Kwoodhawk · Nov 28, 2017

Dagboi said:
"Yeah man, I got your Support Vector Machines, I got your Random Forests.....I got some a them Logistic Regressions. Man, I got all that good shizzle. You want some of them new Convolutional Neural Nets? That sh*t is da Bomb!!"

I think someone is gonna wanna know what "shizzle" means..........

Just saying............

rider976 · Nov 28, 2017

Needanap said:
I think everybody should have their own "Algorithm Guy"

I think the algorithm guy needs his own algorithm guy.

Just kidding andegre ! That one was just too easy.

jvclark · Nov 29, 2017

Needanap said:
I think everybody should have their own "Algorithm Guy"

New Marvel character in the making?.

roblon · Nov 29, 2017

Dagboi said:
"Yeah man, I got your Support Vector Machines, I got your Random Forests.....I got some a them Logistic Regressions. Man, I got all that good shizzle. You want some of them new Convolutional Neural Nets? That sh*t is da Bomb!!"

Perhaps Recurrent Neural Networks or Long Term Short Term NN's. Need to consider the time series aspect of the model. But the problem remains as to what the true y_pred values are. One could train using previous data for each year and use that year's NCAA tournament result as the true model prediction value. Then run the model on the current year to predict upcoming NCAA results for each weight class. This would be easy to code up if access to the feature data is available (wrestlestat is already using some of this data for its own model). Feature engineering would be important but there are many methods to aid in this. Cross validation and holdout sets for training and testing should not be a problem if enough data can be obtained. Good luck.

WWDMHawkeye · Nov 29, 2017

roblon said:
Perhaps Recurrent Neural Networks or Long Term Short Term NN's. Need to consider the time series aspect of the model. But the problem remains as to what the true y_pred values are. One could train using previous data for each year and use that year's NCAA tournament result as the true model prediction value. Then run the model on the current year to predict upcoming NCAA results for each weight class. This would be easy to code up if access to the feature data is available (wrestlestat is already using some of this data for its own model). Feature engineering would be important but there are many methods to aid in this. Cross validation and holdout sets for training and testing should not be a problem if enough data can be obtained. Good luck.

Wow -- I think I'm a little smarter for having read that, but. . . was that English? It looked familiar but nothing really registered in my brain. . . . I'm impressed. . . I think!

rider976 · Nov 29, 2017

roblon said:
Perhaps Recurrent Neural Networks or Long Term Short Term NN's. Need to consider the time series aspect of the model. But the problem remains as to what the true y_pred values are. One could train using previous data for each year and use that year's NCAA tournament result as the true model prediction value. Then run the model on the current year to predict upcoming NCAA results for each weight class. This would be easy to code up if access to the feature data is available (wrestlestat is already using some of this data for its own model). Feature engineering would be important but there are many methods to aid in this. Cross validation and holdout sets for training and testing should not be a problem if enough data can be obtained. Good luck.

roblon · Nov 29, 2017

roblon said:
Perhaps Recurrent Neural Networks or Long Term Short Term NN's. Need to consider the time series aspect of the model. But the problem remains as to what the true y_pred values are. One could train using previous data for each year and use that year's NCAA tournament result as the true model prediction value. Then run the model on the current year to predict upcoming NCAA results for each weight class. This would be easy to code up if access to the feature data is available (wrestlestat is already using some of this data for its own model). Feature engineering would be important but there are many methods to aid in this. Cross validation and holdout sets for training and testing should not be a problem if enough data can be obtained. Good luck.

To simplify, the current Wrestlestat rankings are the result of a hand estimated model with no training. I see that they are now planning to look at NCAA results and possibly tweak their model so that their predictions are in better agreement. The point is that there are better ways to do this in an automated manner which are far more accurate. There are lots of people doing this for NCAA basketball for example. Their tournament provides some great betting opportunities if you have an edge. No reason that this cannot be extended to wrestling. Different features, but similar models.

Dagboi · Nov 30, 2017

roblon said:
Perhaps Recurrent Neural Networks or Long Term Short Term NN's. Need to consider the time series aspect of the model. But the problem remains as to what the true y_pred values are. One could train using previous data for each year and use that year's NCAA tournament result as the true model prediction value. Then run the model on the current year to predict upcoming NCAA results for each weight class. This would be easy to code up if access to the feature data is available (wrestlestat is already using some of this data for its own model). Feature engineering would be important but there are many methods to aid in this. Cross validation and holdout sets for training and testing should not be a problem if enough data can be obtained. Good luck.

Oh, I could feature engineer the pants off of this prediction. Problem is, all the data I’d want to dump into the models is too closely guarded by the coaches. Who’s racking up points in practice, who’s maintaining weight with a steady rhythm, how each wrestler is ranked by their peers and teammates, timestamps from hours spent in the weight room, cardio, and watching film, number of tweets posted, hours of sleep per night, junk food consumed, alcohol consumed, mental strength survey data, coaches’ film breakdown/grades......dump it all in a rolling Bayes net or gradient boosted machine and let God sort it out.

And then it would all be completely up-ended in March by who has the heart of a champion.

Dagboi · Nov 30, 2017

The beauty of the wrestlebacks system employed at NCAA’s is that it makes the final outcome measure of the tournament results slightly more robust—that second chance to climb back up the ladder should lead to a more predictable set of outcomes than a single-elimination tournament. Take last year’s results as your y_pred values for score prediction (or heck, the last five years’ data), then develop a model to use for this year based on a kitchen sink of features and how much they each contributed to score prediction (or the reduction of score prediction error) in previous years.

I would avoid regression-based approaches to the algorithm, because they don’t play nicely with high-dimensional data. The key, to me, would be to take a Big Data approach, gathering a giant swath of relevant feature data from hundreds of variables. The contributing factors are in there somewhere, if we could just get the tracking and access to start looking for them.

But that’s kinda creepy, and takes all the f*** out of it.

rider976 · Nov 30, 2017

Dagboi · Nov 30, 2017

rider976 said:
View embedded media

Great. I guess we’re going to finish 42nd this year.

rider976 · Nov 30, 2017

Dagboi said:
Great. I guess we’re going to finish 42nd this year.

Nah, that's a different question.

TheRealPD2 · Dec 1, 2017

andegre said:
@TheRealPD2 you're welcome!

Holy tamales...0-0. yet still with the most points....Einstein is grinning from ear to ear. lol...THAN YOU, THANK YOU, THANK YOU. ha ha ha.

p.s. He was 43-0 at Iowa Central, not 10-1.

roblon · Dec 2, 2017

Dagboi said:
The beauty of the wrestlebacks system employed at NCAA’s is that it makes the final outcome measure of the tournament results slightly more robust—that second chance to climb back up the ladder should lead to a more predictable set of outcomes than a single-elimination tournament. Take last year’s results as your y_pred values for score prediction (or heck, the last five years’ data), then develop a model to use for this year based on a kitchen sink of features and how much they each contributed to score prediction (or the reduction of score prediction error) in previous years.

I would avoid regression-based approaches to the algorithm, because they don’t play nicely with high-dimensional data. The key, to me, would be to take a Big Data approach, gathering a giant swath of relevant feature data from hundreds of variables. The contributing factors are in there somewhere, if we could just get the tracking and access to start looking for them.

But that’s kinda creepy, and takes all the f*** out of it.

Whether you use Gradient Boosting, Random Forests or even an ensemble model approach will determine whether you get that last 0.1% of accuracy. The importance is getting the features and then modeling against true target values (the NCAA tournament). Most of these features are readily available from past matches. Most sports are now using machine learning in at least some aspect. The current ranking methods in wrestling are relatively stone age in comparison. Someone should make this a Kaggle competition and put up a small prize. You would have hundreds of the best data scientists competing to develop the best model. As I said, they already have such models for the NCAA basketball tournament.

WrestleStat Week 4 Rankings

HB Heisman

HB Heisman

HB All-American

HB Heisman

All-Conference

HB Heisman

All-Conference

HB MVP

HB All-State

HB Heisman

Scout Team

HB MVP

Scout Team

HB Heisman

Scout Team

Scout Team

HB All-State

HB All-State

Scout Team

HB All-State

Scout Team

Team MVP

Scout Team

Similar threads