ADVERTISEMENT

Kirk was on 670 The Score here in Chicago today

There IS such thing as good data versus bad data. If data is collected in an exceedingly careful fashion and it is meticulously annotated ... then YES, it would be a mistake to disregard such data out of hand. In fact, many interesting results in science have resulted from coming up with new models to accounted for such "interesting" (and seemingly anomalous) data points.

On the flip side, if data is taken in a relatively more haphazard fashion and is poorly annotated - then such data can actually lead to incorrect conclusions and screw up results. For example, in micro-biology, when doing careful phylogenetic analysis .... skilled bioinformaticians usually have to exclude Craig Venter's big data dumps because they're so poorly annotated. If you're unfamiliar with Venter - he's the same guy who wanted to patent the human genome (thankfully public entities "beat him" ... and were able to block his attempts). However, later in life, he rode around the ocean shotgun sequencing the crap out of samples from the ocean. However, there are all sorts of problem with his data ... like the quality of filters he was using (it's likely that his data also includes genomes of giant viruses - but you wouldn't know it from his data). He was of the opinion that the more data he could get - the better. Unfortunately, so much of his data was acquired in a sloppy fashion and wasn't properly annotated ... that it's still not yet clear if his data dumps did more harm than good.

Valid. I stand corrected. However in our scenario here "bad data" would come from (example: win loss percentage while coaching at a previous university or while not the head coach) would you agree removing any of the data collected while KF has been the HC of iowa would discredit the data set?

I should have been ahead of this. I once conducted a study on the impact of college athletes 40 times in relation to sunlight. I had to throw out the data collected around midday as many of my participants had a full belly from lunch.
 
Last edited:
The data needs to be included in order to show reliable data. If you want to say looking at the most recent 5 years is more valid than his entire body ofnwoek you might have a stance but your sacrificing reliability for validity ar the point. Having all the data included makes the information more reliable if you want to say it's worth the reliability for what you feel is more validity. Your case study would be crushed by anyone with a working knowledge of stats.
So you're saying the first few seasons weren't relevant after the 02 (4th ) season, but they are now. Got it.
 
He averages 7.5 wins and 5.1 losses a year without the bowls included.

Wrong. Those numbers include his bowls. they are 7.53 and 5.11

If you average 7.5 wins and 5.1 losses during the regular season and you play an average of 12 games a during a regular season

You can't average 7.53 wins and 5.11 losses by averaging 12 games. We have not averaged 12 games.

Except he isnt closer to 8 than he is 7.

Yes, he is. This is clear to almost everyone

You dont get to round and then accuse someone of skewing data

His average win number is 7.53. You are the one attempting to round it down to 7. This is important in figuring out why people don't agree with you

Show me how you get to 8 without rounding.

No one in this thread has ever said they are not rounding when they say he is closer to 8 wins than 7.

A coach averages 7.5 wins and 5.1 losses for every 12 games played. If that coach plays 12 games how many wins does he have?

No one will answer this question because it proves absolutely nothing. It provides no valid information, and is nothing more than speculation and diversion.

The bowl results actually hurt his numbers.

Again, the bowl results are included in his 143-97 record, or 7.53 wins per year.

I'm not a KF hater but some of the stances the KF Homer's take are just asinine.

lol

It's easier just to say he is a 7-5 regular season coach.

for a hater......

He is 7-8 in the bowls of I really wanted to slam on that I would use the numbers.

again, the bowls are included. I made the mistake of taking you at your word at first, but then I looked it up. You've been yacking about this incessantly and you've been giving us bad info the whole time.

You cannot remove data and claim the numbers are valid.

True. Stop rounding down to 7

7.65 does not mean he averages 8 wins. It means he averages 7 wins and is more likely to win 8 than he is 6 but his number is still 7 until it is 8.

Wrong again. He averages 7.53 wins. That does not mean that 7 is his average

Yes it s closer but close here shows you that if he doesnt win 7 he is much more likely to win 8 than 6 but his average is still 7.

Wrong again. Average is 7.53, not 7

that's how statistics work, not "rounding" ya dunce.

Then stop rounding down to 7?

He is closer to 8 than 6 with those numbers but his number is still 7 so he is "closer" to 7.

Still not right even though you keep repeating it.

If his win total was closer to 8 than it is 7 he would have an above .615 (13 game) win percentage.
Wrong again. His average is 7.53 wins per year. 7.53 is closer to 8 than 7.
His win percentage is closer to that of a 7-5 coach than that of an 8-5 coach though, I can give you that....

Number of times we have played 13 regular season games in the last 19 years =1.
Wrong and invalid


As I said earlier half a win doesnt mean piss. You have 7 until you have 8.[/QUOTE]
Your rules, not mine. He has 7.53

Those of us smart enough to realize we only play 13 games in bowl years and even then he averages 7 wins will know your not very smart.
lol

: rounds "goes up to" "gets moved to" all the same.

Wrong again. "Rounds" is not synonomous with "goes up to". It means we change it to the closer number.

If a coach averages 7.5 wins and 5 losses for every 12 games he coaches. After 12 games, how many has he actually won?
The fact that you think this answer would prove anything is baffling.

at least I understand how statistics work.
Ok......

It isnt rounding when you use the actual number.....
Then stop rounding down to 7 if you're so against it.

Now I want a legendary thread to exist so all who visit this site can see your lack of intelligence.
lol

You're truly not that dumb.......... there is no way you're this dumb.
It's true. He is not that dumb

No it doesnt, because you have an actual data set and only 4th graders round.
lol

So if you want to say "closer" draw a 7 on a piece of paper and an 8 on a piece of paper and put your finger on the "7" of 7.5 until that number is "8" it is a 7. What number is your finger "closer" to?

Actually draw a 7 and an 8 on a piece of paper and then put your finger on 7.53. Which number is your finger closer to? 7 or 8? 7 you say? The rest of us have our finger closer to 8.....

The only analyzing you may have done was data entry.

that's probably not accurate either, but why start being accurate now?

No one with any respect for numbers would do what your attempting to do with this information.

lol

KF is a 7-5 coach for life, I would love to be wrong but that is basically "perfect".

Good news! It turns out that you are wrong......

There is no way to be an 8-5 coach when you play 12 regular season games a year.)

True. Invalid since we have not averaged 12 games, but true.

For the love of good can we please acknowledge that when discussing statistics you do not round.

Stop rounding down to 7?

A 7.99 is not more likely to win 8 than he is 7 he is more likely to win 8 than 6 but his most likely is still 7.

again, this "more likely" stuff. Invalid

The level of heel digging by the "not 7-5" crowd is amazing.

Ironic post of the year?


Your case study would be crushed by anyone with a working knowledge of stats.

We can't all have your greater level of understanding....

Never once in the history of science has a study had more reliability due to removing data.

No way you can know this to be true. Sure didn't stop you from claiming it though.....
 
My apologies, I am not going to make it personal. If you dont get it you dont get it. Best of luck.
 
Last edited:
Valid. I stand corrected. However in our scenario here "bad data" would come from (example: win loss percentage while coaching at a previous university or while not the head coach) would you agree removing any of the data collected while KF has been the HC of iowa would discredit the data set?

I should have been ahead of this. I once conducted a study on the impact of college athletes 40 times in relation to sunlight. I had to throw out the data collected around midday as many of my participants had a full belly from lunch.
My comment wasn't a criticism of your comment - rather, you're basing your vantage on the assumption that the data is good. As for discerning the "goodness" of the data - I think many folks discount it because they view the first few years of Ferentz's tenure as also characterizing a culture shift within the program. Furthermore, there might be similar comments about whether the personnel on the roster matched Ferentz's schemes and also whether there might be a perceived "transition" while the guys learn the language of the new schemes and understanding the adjustments within the new schemes.

However, it becomes hard to justify removing such data points - when the Hawks have had similar cultural transient issues through a number of years (2005-2007) and the "blip" we saw in 2014. We wouldn't remove 2012 from consideration, even though it too was a "transition" year in terms of schemes, would we?

Also, if you compare performances across different coaches - you also saw that O'Brien had surprisingly good performances in Penn State's passing game - despite their program reeling from the whole Sandusky mess. There was all sorts of negativity surrounding that program then, there were scholarship limitations, and there was most certainly a "transition" there schematically. However, all the same, their passing game was surprisingly formidable. Suppose we look at scenarios like Charlie Weiss at Notre Dame or Brady Hoke at Michigan ... both guys started off comparatively "on fire" at each of those programs ... but both ultimately "flamed out."

I fail to see how Ferentz's poor start could justifiably be ignored when other coaches manage to fare better earlier on. On the flip side, Ferentz deserves a lot of credit for not letting the program slide like Hoke or Weiss. As other B1G coaches have anonymously stated ... in an "ordinary" year a bad Iowa season still usually lands with the Hawks netting 6 or 7 wins. Whereas, in a good year, the Hawks can contend for a high-level bowl. Now, there is some interesting subtlety in the above statement ... in an "ordinary" year a bad Iowa season still usually has the Hawks getting 6 or 7 wins. This is a rather intuitive statement that we, as fan, kinda expect from the Hawks. However, does any mathematics back it up?

The answer is YES! This again touches upon omitting data points ... but not in a targeted fashion. The idea that I'm about to explain relates to what is known as the "bootstrap" in statistics. Rather than treat the entire data set as a whole with a single analysis - instead you sample the full data set several times. Say, out of Ferentz's career - you sample 12 out of Ferentz's 19 seasons ... but you do this sampling 10 separate times. For each of these sub-samples, you extra means, variances, etc. Why create this "ensemble" of sub-samples of the data? The idea is that we likely don't have an accurate causal model that we trust from which to discern "error" in our data. Consequently, the variation in the results among the sub-samples lets us "bootstrap" an error-analysis to our data. Thus, it allows us to assess (to some extent) the reliability of our data.

Of course, the art of the bootstrap is that you have to be careful to not oversample or under sample the full data set when creating your ensemble of sub-samples. If you oversample your data - then you're likely to "overfit" your data. If you do this, you might be finding yourself fitting "noise" rather than the pattern you're looking for. If you under-fit the data - you might not be looking at a fine enough scale to "see" the pattern either.
 
My comment wasn't a criticism of your comment - rather, you're basing your vantage on the assumption that the data is good. As for discerning the "goodness" of the data - I think many folks discount it because they view the first few years of Ferentz's tenure as also characterizing a culture shift within the program. Furthermore, there might be similar comments about whether the personnel on the roster matched Ferentz's schemes and also whether there might be a perceived "transition" while the guys learn the language of the new schemes and understanding the adjustments within the new schemes.

However, it becomes hard to justify removing such data points - when the Hawks have had similar cultural transient issues through a number of years (2005-2007) and the "blip" we saw in 2014. We wouldn't remove 2012 from consideration, even though it too was a "transition" year in terms of schemes, would we?

Also, if you compare performances across different coaches - you also saw that O'Brien had surprisingly good performances in Penn State's passing game - despite their program reeling from the whole Sandusky mess. There was all sorts of negativity surrounding that program then, there were scholarship limitations, and there was most certainly a "transition" there schematically. However, all the same, their passing game was surprisingly formidable. Suppose we look at scenarios like Charlie Weiss at Notre Dame or Brady Hoke at Michigan ... both guys started off comparatively "on fire" at each of those programs ... but both ultimately "flamed out."

I fail to see how Ferentz's poor start could justifiably be ignored when other coaches manage to fare better earlier on. On the flip side, Ferentz deserves a lot of credit for not letting the program slide like Hoke or Weiss. As other B1G coaches have anonymously stated ... in an "ordinary" year a bad Iowa season still usually lands with the Hawks netting 6 or 7 wins. Whereas, in a good year, the Hawks can contend for a high-level bowl. Now, there is some interesting subtlety in the above statement ... in an "ordinary" year a bad Iowa season still usually has the Hawks getting 6 or 7 wins. This is a rather intuitive statement that we, as fan, kinda expect from the Hawks. However, does any mathematics back it up?

The answer is YES! This again touches upon omitting data points ... but not in a targeted fashion. The idea that I'm about to explain relates to what is known as the "bootstrap" in statistics. Rather than treat the entire data set as a whole with a single analysis - instead you sample the full data set several times. Say, out of Ferentz's career - you sample 12 out of Ferentz's 19 seasons ... but you do this sampling 10 separate times. For each of these sub-samples, you extra means, variances, etc. Why create this "ensemble" of sub-samples of the data? The idea is that we likely don't have an accurate causal model that we trust from which to discern "error" in our data. Consequently, the variation in the results among the sub-samples lets us "bootstrap" an error-analysis to our data. Thus, it allows us to assess (to some extent) the reliability of our data.

Of course, the art of the bootstrap is that you have to be careful to not oversample or under sample the full data set when creating your ensemble of sub-samples. If you oversample your data - then you're likely to "overfit" your data. If you do this, you might be finding yourself fitting "noise" rather than the pattern you're looking for. If you under-fit the data - you might not be looking at a fine enough scale to "see" the pattern either.

As I wrote this - it got me curious - somebody should normalize Ferentz's data (probably look at something more like win-percentage rather than the number of wins). Then bootstrap the data set to break-down the ensemble of win-percentages ... then we can find some of the base moments of the distribution and find the variances across the ensemble of sub-samples.
 
My comment wasn't a criticism of your comment - rather, you're basing your vantage on the assumption that the data is good. As for discerning the "goodness" of the data - I think many folks discount it because they view the first few years of Ferentz's tenure as also characterizing a culture shift within the program. Furthermore, there might be similar comments about whether the personnel on the roster matched Ferentz's schemes and also whether there might be a perceived "transition" while the guys learn the language of the new schemes and understanding the adjustments within the new schemes.

However, it becomes hard to justify removing such data points - when the Hawks have had similar cultural transient issues through a number of years (2005-2007) and the "blip" we saw in 2014. We wouldn't remove 2012 from consideration, even though it too was a "transition" year in terms of schemes, would we?

Also, if you compare performances across different coaches - you also saw that O'Brien had surprisingly good performances in Penn State's passing game - despite their program reeling from the whole Sandusky mess. There was all sorts of negativity surrounding that program then, there were scholarship limitations, and there was most certainly a "transition" there schematically. However, all the same, their passing game was surprisingly formidable. Suppose we look at scenarios like Charlie Weiss at Notre Dame or Brady Hoke at Michigan ... both guys started off comparatively "on fire" at each of those programs ... but both ultimately "flamed out."

I fail to see how Ferentz's poor start could justifiably be ignored when other coaches manage to fare better earlier on. On the flip side, Ferentz deserves a lot of credit for not letting the program slide like Hoke or Weiss. As other B1G coaches have anonymously stated ... in an "ordinary" year a bad Iowa season still usually lands with the Hawks netting 6 or 7 wins. Whereas, in a good year, the Hawks can contend for a high-level bowl. Now, there is some interesting subtlety in the above statement ... in an "ordinary" year a bad Iowa season still usually has the Hawks getting 6 or 7 wins. This is a rather intuitive statement that we, as fan, kinda expect from the Hawks. However, does any mathematics back it up?

The answer is YES! This again touches upon omitting data points ... but not in a targeted fashion. The idea that I'm about to explain relates to what is known as the "bootstrap" in statistics. Rather than treat the entire data set as a whole with a single analysis - instead you sample the full data set several times. Say, out of Ferentz's career - you sample 12 out of Ferentz's 19 seasons ... but you do this sampling 10 separate times. For each of these sub-samples, you extra means, variances, etc. Why create this "ensemble" of sub-samples of the data? The idea is that we likely don't have an accurate causal model that we trust from which to discern "error" in our data. Consequently, the variation in the results among the sub-samples lets us "bootstrap" an error-analysis to our data. Thus, it allows us to assess (to some extent) the reliability of our data.

Of course, the art of the bootstrap is that you have to be careful to not oversample or under sample the full data set when creating your ensemble of sub-samples. If you oversample your data - then you're likely to "overfit" your data. If you do this, you might be finding yourself fitting "noise" rather than the pattern you're looking for. If you under-fit the data - you might not be looking at a fine enough scale to "see" the pattern either.

When running a bootstrap analysis would you still run a t-test?


Ps for those of you who didnt understand what Homer just said omitting data is not correct. You might be able to find different or more usable information looking at segments of data (that's that whole reliability vs validity conversation) but in order to establish a yearly win total you need to include all data.

Again contrary to popular belief I think KF is a good coach and an outstanding brand ambassador, the data is just data.
 
Last edited:
Dude I would gladly help with the data collection , I have no idea what test your going to run or even what program you would use to run it. (Is spss still the go too?) I love stats and numbers though and would gladly help with the research.
 
Dude I would gladly help with the data collection , I have no idea what test your going to run or even what program you would use to run it. (Is spss still the go too?) I love stats and numbers though and would gladly help with the research.
The beauty of stats is that if you know what you're doing, a spread sheet can still do it (it's just that other programs like S simply can automate the calculation for you).

The only thing that's bugging me is that correlations across years, in principle, should impact how you draw your sub-samples when doing your bootstrap analysis. However, if you don't mind doing a naive first pass - you could sample the years using a uniform distribution (which sounds like that is what you'd prefer to do anyhow).

In a naive first pass - you list out Ferentz's 19 win percentages. The advantage of dealing with this as the relevant observable is that we then don't have to quibble about how the number of games isn't the same every year. Then, as I mentioned before, select a subsample of the 19 percentages. The interesting thing about doing the subsample, is that you can possible select the same year several times (typically in the bootstrap, you allow repeats). As I mentioned before, suppose you sample the set of 19 percentages some number of times (like 12 or 15 times - or something like that). Then your repeat this process some number of time - say like 10 times or so. The idea here being that we then have 10 lists of 12 (or 15) win percentages. For each of these lists, we can calculate the mean and standard deviation with any statistics tool of your choice. The idea is then that from this list of moments - you can calculate the moments of the moments. For example, the standard deviation the 10 means can supply us with some sense of "error" in the data. Mind you, I'm winging the the numbers (of subsampling the original data 12 or 15 times and then repeating it 10 times). Usually for a given data set, you usually iteratively adjust the aforementioned numbers until you get some sort of "converge" in your results.

Regardless, the above gives you a conceptual "feel" for some of the ideas behind the bootstrap. There's plenty of stuff I'm brushing under the rug - and the details are most certainly important. However, at the very least, I'd view the calculations as a fun little distraction.
 
My comment wasn't a criticism of your comment - rather, you're basing your vantage on the assumption that the data is good. As for discerning the "goodness" of the data - I think many folks discount it because they view the first few years of Ferentz's tenure as also characterizing a culture shift within the program. Furthermore, there might be similar comments about whether the personnel on the roster matched Ferentz's schemes and also whether there might be a perceived "transition" while the guys learn the language of the new schemes and understanding the adjustments within the new schemes.

However, it becomes hard to justify removing such data points - when the Hawks have had similar cultural transient issues through a number of years (2005-2007) and the "blip" we saw in 2014. We wouldn't remove 2012 from consideration, even though it too was a "transition" year in terms of schemes, would we?

Also, if you compare performances across different coaches - you also saw that O'Brien had surprisingly good performances in Penn State's passing game - despite their program reeling from the whole Sandusky mess. There was all sorts of negativity surrounding that program then, there were scholarship limitations, and there was most certainly a "transition" there schematically. However, all the same, their passing game was surprisingly formidable. Suppose we look at scenarios like Charlie Weiss at Notre Dame or Brady Hoke at Michigan ... both guys started off comparatively "on fire" at each of those programs ... but both ultimately "flamed out."

I fail to see how Ferentz's poor start could justifiably be ignored when other coaches manage to fare better earlier on. On the flip side, Ferentz deserves a lot of credit for not letting the program slide like Hoke or Weiss. As other B1G coaches have anonymously stated ... in an "ordinary" year a bad Iowa season still usually lands with the Hawks netting 6 or 7 wins. Whereas, in a good year, the Hawks can contend for a high-level bowl. Now, there is some interesting subtlety in the above statement ... in an "ordinary" year a bad Iowa season still usually has the Hawks getting 6 or 7 wins. This is a rather intuitive statement that we, as fan, kinda expect from the Hawks. However, does any mathematics back it up?

The answer is YES! This again touches upon omitting data points ... but not in a targeted fashion. The idea that I'm about to explain relates to what is known as the "bootstrap" in statistics. Rather than treat the entire data set as a whole with a single analysis - instead you sample the full data set several times. Say, out of Ferentz's career - you sample 12 out of Ferentz's 19 seasons ... but you do this sampling 10 separate times. For each of these sub-samples, you extra means, variances, etc. Why create this "ensemble" of sub-samples of the data? The idea is that we likely don't have an accurate causal model that we trust from which to discern "error" in our data. Consequently, the variation in the results among the sub-samples lets us "bootstrap" an error-analysis to our data. Thus, it allows us to assess (to some extent) the reliability of our data.

Of course, the art of the bootstrap is that you have to be careful to not oversample or under sample the full data set when creating your ensemble of sub-samples. If you oversample your data - then you're likely to "overfit" your data. If you do this, you might be finding yourself fitting "noise" rather than the pattern you're looking for. If you under-fit the data - you might not be looking at a fine enough scale to "see" the pattern either.

Homer - while I have a lot of respect for your posts here and your reasoning in general, I think you're wrong here. We're not analyzing a stand-alone set of data points here. We have the benefit of additional information with which to interpret the data...information the content of which paints the early data points in a different light. The two years before KF took over we went 7-5, then 3-8. A serious downward trend. We know that Hayden was having trouble recruiting those last few years because opposing coaches were telling prospects Hayden wouldn't be around long. At the time, everyone knew we were going to suck in '99 no matter who the coach was. In short, the cupboard was bare and Kirk had zero culpability in that. He inherited a program in disrepair.

Kirk has never had a season like 99 since. It's clearly an outlier with a clear explanation. It's not even a minor stretch to see that '00 was the beginning of the recovery, but still reeling from the crash at the end of Hayden's career. Again, Kirk has not had a season that bad since. '12 is close, but still better. Why not throw out '12 too? Because of the other key variable that you're ignoring: '99 wasn't just a transition year, it was a time when the roster was full of players that were either not ready for prime time or not B10 caliber at all. Again...not Kirk's fault. In '12, that was Kirk's team and he has to own that.

You compare Iowa's first year to PSU's first year with O'Brien's. PSU, prior to O'Brien, went 7-6 then 9-4 the two preceding years. They're also a blue blood program that historically recruits much better than Iowa. Given those differences comparing Iowa to them isn't really and apples to apples comparison.

You can try to compartmentalize the KF years and look at them without considering the many other known factors, but you'll get skewed data as a result. '99 and '00 don't fit with the others...they're a hangover from the decline at the end of Hayden's tenure. Throwing out those data points may not fit with standard data analysis, but it certainly fits with common sense.
 
I think there are two ways to consider the Ferentz era record.

The first would be to simply disregard YEAR 1 (some say years 1 & 2, but at least year 1 would be fair since any new coach shouldn't be held 100% accountable for what was left in the cupboard to start his tenure. This also counts if he was left with a full cupboard, he shouldn't get 100% credit for that either). Throw that YEAR 1 out and he's at 142-87, .620 ...7.8 wins per season, 4.8 losses. That sure looks like 8-5 to me.

The other way to dissect Kirk's record at Iowa is to break it into halves, the first 8 years (minus YEAR 1) vs the past 10 years. This breaks it up thru the Jake Christensen era 2007, then starting with the Ricky Stanzi era.

00-07: 60-39, .606 ...7.5 wins, 4.9 losses
08-17: 82-48, .631 ...8.2 wins, 4.8 losses

Pretty safe to say that if you correctly round it, you have Kirk at 8-5 either way.
Still feel either of these are the way to go.
 
  • Like
Reactions: RogerKint
Homer - while I have a lot of respect for your posts here and your reasoning in general, I think you're wrong here. We're not analyzing a stand-alone set of data points here. We have the benefit of additional information with which to interpret the data...information the content of which paints the early data points in a different light. The two years before KF took over we went 7-5, then 3-8. A serious downward trend. We know that Hayden was having trouble recruiting those last few years because opposing coaches were telling prospects Hayden wouldn't be around long. At the time, everyone knew we were going to suck in '99 no matter who the coach was. In short, the cupboard was bare and Kirk had zero culpability in that. He inherited a program in disrepair.

Kirk has never had a season like 99 since. It's clearly an outlier with a clear explanation. It's not even a minor stretch to see that '00 was the beginning of the recovery, but still reeling from the crash at the end of Hayden's career. Again, Kirk has not had a season that bad since. '12 is close, but still better. Why not throw out '12 too? Because of the other key variable that you're ignoring: '99 wasn't just a transition year, it was a time when the roster was full of players that were either not ready for prime time or not B10 caliber at all. Again...not Kirk's fault. In '12, that was Kirk's team and he has to own that.

You compare Iowa's first year to PSU's first year with O'Brien's. PSU, prior to O'Brien, went 7-6 then 9-4 the two preceding years. They're also a blue blood program that historically recruits much better than Iowa. Given those differences comparing Iowa to them isn't really and apples to apples comparison.

You can try to compartmentalize the KF years and look at them without considering the many other known factors, but you'll get skewed data as a result. '99 and '00 don't fit with the others...they're a hangover from the decline at the end of Hayden's tenure. Throwing out those data points may not fit with standard data analysis, but it certainly fits with common sense.

If your going to throw away outliers you would throw away '15 with '99 and and 02 with '00. "Common sense" is a good way to end up with numbers someone might like but they are not accurate numbers. Homer I'll have the raw data for you today.
 
Homer - while I have a lot of respect for your posts here and your reasoning in general, I think you're wrong here. We're not analyzing a stand-alone set of data points here. We have the benefit of additional information with which to interpret the data...information the content of which paints the early data points in a different light. The two years before KF took over we went 7-5, then 3-8. A serious downward trend. We know that Hayden was having trouble recruiting those last few years because opposing coaches were telling prospects Hayden wouldn't be around long. At the time, everyone knew we were going to suck in '99 no matter who the coach was. In short, the cupboard was bare and Kirk had zero culpability in that. He inherited a program in disrepair.

Kirk has never had a season like 99 since. It's clearly an outlier with a clear explanation. It's not even a minor stretch to see that '00 was the beginning of the recovery, but still reeling from the crash at the end of Hayden's career. Again, Kirk has not had a season that bad since. '12 is close, but still better. Why not throw out '12 too? Because of the other key variable that you're ignoring: '99 wasn't just a transition year, it was a time when the roster was full of players that were either not ready for prime time or not B10 caliber at all. Again...not Kirk's fault. In '12, that was Kirk's team and he has to own that.

You compare Iowa's first year to PSU's first year with O'Brien's. PSU, prior to O'Brien, went 7-6 then 9-4 the two preceding years. They're also a blue blood program that historically recruits much better than Iowa. Given those differences comparing Iowa to them isn't really and apples to apples comparison.

You can try to compartmentalize the KF years and look at them without considering the many other known factors, but you'll get skewed data as a result. '99 and '00 don't fit with the others...they're a hangover from the decline at the end of Hayden's tenure. Throwing out those data points may not fit with standard data analysis, but it certainly fits with common sense.

Iowa's 1997 squad was tremendously talented - that's part of the reason why the season was so disappointing to the fan-base. I agree that the 1998 season was a turd ... but that was also because, in Fry's own words, he remained longer than he intended - he had a plan for succession, but illness (of the successor) undermined the exchange.

Thus, the 1998 season resulted more from a failed exchange of power than an actual downturn of the program. However, in my prior post, I do agree that there was a significant culture shift in the program. That certainly impacted things in those early years. Is that enough to omit those years? I don't know.

However, if you read the post about bootstrapping the statistics to empirically ascertain error - when resampling the data-set ... those early years will, most likely, be sampled less than seasons where the Hawks do well. The point being that even without curating the data before calculating statistical quantities - you can still arrive at similar results. Through my long history of posting on Iowa football boards - I myself similarly curate the data in my mind OR explicitly annotate the data as outlier data (explained away due to the transition of power and turnover on the roster). However, I'm not certain that it is necessarily to do so. Furthermore, we conceptually curated the data ... however, Iowa's early performances were also part of an upward trend that started in 1999 and culminated in the 2002 season. The fact that are such strong correlations in that trend - I'm not certain that it would be fair to prune that data.
 
Homer,

With the data would it be best to randomize it before pulling your sets so you could avoid picking "bad runs" or "good runs"? Or since the numbers will be ran against themselves several times will it remove the error itself?

Thanks man.
 
If your going to throw away outliers you would throw away '15 with '99 and and 02 with '00. "Common sense" is a good way to end up with numbers someone might like but they are not accurate numbers. Homer I'll have the raw data for you today.

I'm not proposing to throw away all outliers...outset the first couple seasons after inheriting a program in disrepair. The year over year climb from 99 through 02 supports the assertion that the poor performance the first couple years were likely not due to poor coaching. Today's Thait makes sense to omit them...not simply that they were outliers.

This isn't hard to see or understand.
 
  • Like
Reactions: LaQuintaHawkeye
I'm not proposing to throw away all outliers...outset the first couple seasons after inheriting a program in disrepair. The year over year climb from 99 through 02 supports the assertion that the poor performance the first couple years were likely not due to poor coaching. Today's Thait makes sense to omit them...not simply that they were outliers.

This isn't hard to see or understand.

It shouldn't be this hard to understand that when evaluating someone body of work while at that school you need to include all data from said body of work while at that school or that during any given season the coach is only scheduled to play 12 games.
 
Last edited:
Homer,

With the data would it be best to randomize it before pulling your sets so you could avoid picking "bad runs" or "good runs"? Or since the numbers will be ran against themselves several times will it remove the error itself?

Thanks man.
When you sample from your original data - you should pull the data at random. Myself, I used random numbers from random.org because unlike computer-generated random numbers - they're pulled from atmospheric data that is though to be truly random (due to turbulence). Computer generated random numbers are just pseudo-random numbers (although their periodicity is extremely large).

Anyhow, before I even did the analysis, my intuition was that we only had 19 data points. All things considered, that's not a lot of data. Furthermore, there is definitely a bit of spread in the data - thus, my intuition was that the reliability of our results would have some definite error.

This is what I found (using the naïve choice of 12 sample values - and creating an ensemble of 10 data sub-sampled data sets). People can fairly attack these choices - because they're far from optimized for the given data-set ... but they seem adequately reasonable.

Anyhow, here is what I found:

For the original data set of 19 data points: mean = 0.5887 standard dev = 0.2065
sub-sample 1: mean = 0.5926 stdev = 0.2400
sub-sample 2: mean = 0.6206 stdev = 0.2576
sub-sample 3: mean = 0.7319 stdev = 0.1283
sub-sample 4: mean = 0.5479 stdev = 0.2310
sub-sample 5: mean = 0.6550 stdev = 0.2330
sub-sample 6: mean = 0.5810 stdev = 0.1403
sub-sample 7: mean = 0.6101 stdev = 0.2345
sub-sample 8: mean = 0.5880 stdev = 0.0878
sub-sample 9: mean = 0.6333 stdev = 0.1757
sub-sample 10: mean = 0.5985 stdev = 0.1818

Combining everything together: We can take the mean of the means now - and that will give us another "measure" of the mean. However, the standard deviation in these means can then conceptually be viewed as a measure of the ERROR in the mean.

mean (of the mean values) = 0.615879
Ironically, this win percentage, for a 13 game season ends up landing right at 8 wins.

standard deviation in the sub-sampled mean-values = 0.050349
The first thing to notice is that this measure is telling us something DIFFERENT than the standard deviation of the original full data set. The original standard deviation is telling us about the spread in the distribution. This standard deviation is more explicitly addressing the reliability of our calculated mean value. The implication of this is rather important.

We could expect a mean value to be drawn from the underlying distribution to be as high as 0.66228. This would correspond (in a 13 game season) to an expectation value of 8.66 wins. We could also expect a mean value to be drawn from the underlying distribution to be as low as 0.56553. This would correspond (in a 13 game season) to an expectation value of 7.35 wins.

Notice how the calculated mean from the full data set nicely fits within the "error bars" supplied above. The value of 0.5887 falls within accepted limits. However, it is interesting to note that it certainly is much closer to the LOWER limit than the HIGHER limit.

Mind you, the above supplies us with a spread in MEAN values. This isn't saying anything (yet) about the variability in the spread of the distribution itself.

mean of the standard deviations = 0.190988 (this is rather close to the value drawn from the full data set)

standard deviation of the standard deviations = 0.057315 (as in the case of the mean, this tells supplies us with information about the ERROR in the spread of the empirical distribution)

Relative to the calculated value of 0.057315 ... this is telling us that we have around 30% error in our measure of how spread out the distribution is. Again, given 19 data points and the variability in the data, this shouldn't really surprise us.

Given the quality of this data (or lack thereof) - on first blush, I'd likely conclude that we wouldn't have to look much beyond the conceptual statistics here to draw conclusions.

What the data tells us is that in any given year, the Hawks are likely to average anywhere around 7 wins to 9 wins. The spread in the data isn't terribly helpful ... but it indicates the obvious that sometimes we can do much better ... and sometimes much worse (kind of a "duh" statement - but one that must be uttered).
 
When you sample from your original data - you should pull the data at random. Myself, I used random numbers from random.org because unlike computer-generated random numbers - they're pulled from atmospheric data that is though to be truly random (due to turbulence). Computer generated random numbers are just pseudo-random numbers (although their periodicity is extremely large).

Anyhow, before I even did the analysis, my intuition was that we only had 19 data points. All things considered, that's not a lot of data. Furthermore, there is definitely a bit of spread in the data - thus, my intuition was that the reliability of our results would have some definite error.

This is what I found (using the naïve choice of 12 sample values - and creating an ensemble of 10 data sub-sampled data sets). People can fairly attack these choices - because they're far from optimized for the given data-set ... but they seem adequately reasonable.

Anyhow, here is what I found:

For the original data set of 19 data points: mean = 0.5887 standard dev = 0.2065
sub-sample 1: mean = 0.5926 stdev = 0.2400
sub-sample 2: mean = 0.6206 stdev = 0.2576
sub-sample 3: mean = 0.7319 stdev = 0.1283
sub-sample 4: mean = 0.5479 stdev = 0.2310
sub-sample 5: mean = 0.6550 stdev = 0.2330
sub-sample 6: mean = 0.5810 stdev = 0.1403
sub-sample 7: mean = 0.6101 stdev = 0.2345
sub-sample 8: mean = 0.5880 stdev = 0.0878
sub-sample 9: mean = 0.6333 stdev = 0.1757
sub-sample 10: mean = 0.5985 stdev = 0.1818

Combining everything together: We can take the mean of the means now - and that will give us another "measure" of the mean. However, the standard deviation in these means can then conceptually be viewed as a measure of the ERROR in the mean.

mean (of the mean values) = 0.615879
Ironically, this win percentage, for a 13 game season ends up landing right at 8 wins.

standard deviation in the sub-sampled mean-values = 0.050349
The first thing to notice is that this measure is telling us something DIFFERENT than the standard deviation of the original full data set. The original standard deviation is telling us about the spread in the distribution. This standard deviation is more explicitly addressing the reliability of our calculated mean value. The implication of this is rather important.

We could expect a mean value to be drawn from the underlying distribution to be as high as 0.66228. This would correspond (in a 13 game season) to an expectation value of 8.66 wins. We could also expect a mean value to be drawn from the underlying distribution to be as low as 0.56553. This would correspond (in a 13 game season) to an expectation value of 7.35 wins.

Notice how the calculated mean from the full data set nicely fits within the "error bars" supplied above. The value of 0.5887 falls within accepted limits. However, it is interesting to note that it certainly is much closer to the LOWER limit than the HIGHER limit.

Mind you, the above supplies us with a spread in MEAN values. This isn't saying anything (yet) about the variability in the spread of the distribution itself.

mean of the standard deviations = 0.190988 (this is rather close to the value drawn from the full data set)

standard deviation of the standard deviations = 0.057315 (as in the case of the mean, this tells supplies us with information about the ERROR in the spread of the empirical distribution)

Relative to the calculated value of 0.057315 ... this is telling us that we have around 30% error in our measure of how spread out the distribution is. Again, given 19 data points and the variability in the data, this shouldn't really surprise us.

Given the quality of this data (or lack thereof) - on first blush, I'd likely conclude that we wouldn't have to look much beyond the conceptual statistics here to draw conclusions.

What the data tells us is that in any given year, the Hawks are likely to average anywhere around 7 wins to 9 wins. The spread in the data isn't terribly helpful ... but it indicates the obvious that sometimes we can do much better ... and sometimes much worse (kind of a "duh" statement - but one that must be uttered).

Yeah not going to lie homer when we add in 30% error I would think the SD would have been much smaller. While fun I cannot say that test provided much info. While fun it kind of resulted in a "duh" as you say. U think part of the problem is the number of games played it's skewing the data that you can have a 12 game non bowl season and a 14 game BTCG and bowl game season? What test could we run to say after a 12 game regular season we are most likely to have x wins its almost impossible with 6-6 getting a 13th game.
 
Last edited:
Yeah not going to lie homer when we add in 30% error I would think the SD would have been much smaller. While fun I cannot say that test provided much info. While fun it kind of resulted in a "duh" as you say. U think part of the problem is the number of games played it's skewing the data that you can have a 12 game non bowl season and a 14 game BTCG and bowl game season? What test could we run to say after a 12 game regular season we are most likely to have x wins its almost impossible with 6-6 getting a 13th game.
An alternative approach is that you could list out ALL of the games played and sub-sample off that. However, sampling from that data set would be fraught with its own set of problems. For example, sub-samples from such a data set would likely oversample the games against easier opponents. We'd then get a skewed view of the Hawks performance being better than it really has been. Thus, there are other "variables" at play that are important that we need to consider in order to make our results adequately representative.

You're also correct in asserting that the variability in the number of games played also adds some bias to the analysis. However, it at least helps regularize the data - and does a better job of keeping the strength of schedule closer to being "fixed."

If we were to do a really rigorous analysis, we'd do a sensitivity analysis to determine what variables we need to control. For example, we'd likely need to differentiate between home games versus away games. Similarly, we'd have to do some partitioning based on the quality of competition. Thus, upon identifying the appropriate variables, we'd be able to empirically calculate the relevant conditional probabilities (through boot-strapping). Then, based on the conditional probabilities, it would weight how we sample from the different partitioned groups. From there, we could then calculate all the means and standard deviations as before - and it would be a more rigorous analysis. It would also be a far more time-consuming analysis than I'd being willing to dink around doing (unless I got paid to do it).
 
An alternative approach is that you could list out ALL of the games played and sub-sample off that. However, sampling from that data set would be fraught with its own set of problems. For example, sub-samples from such a data set would likely oversample the games against easier opponents. We'd then get a skewed view of the Hawks performance being better than it really has been. Thus, there are other "variables" at play that are important that we need to consider in order to make our results adequately representative.

You're also correct in asserting that the variability in the number of games played also adds some bias to the analysis. However, it at least helps regularize the data - and does a better job of keeping the strength of schedule closer to being "fixed."

If we were to do a really rigorous analysis, we'd do a sensitivity analysis to determine what variables we need to control. For example, we'd likely need to differentiate between home games versus away games. Similarly, we'd have to do some partitioning based on the quality of competition. Thus, upon identifying the appropriate variables, we'd be able to empirically calculate the relevant conditional probabilities (through boot-strapping). Then, based on the conditional probabilities, it would weight how we sample from the different partitioned groups. From there, we could then calculate all the means and standard deviations as before - and it would be a more rigorous analysis. It would also be a far more time-consuming analysis than I'd being willing to dink around doing (unless I got paid to do it).

It's all good homer I have thoroughly enjoyed this exercise with you. Thank you for being willing to run the numbers. I had not considered the subset idea. I am sure there will be people that will throw this thread away as trash somewhere in the middle of the pissing contest but I would challenge this might be the most informative thread that has ever been on HR.
 
Iowa's 1997 squad was tremendously talented - that's part of the reason why the season was so disappointing to the fan-base. I agree that the 1998 season was a turd ... but that was also because, in Fry's own words, he remained longer than he intended - he had a plan for succession, but illness (of the successor) undermined the exchange.

Thus, the 1998 season resulted more from a failed exchange of power than an actual downturn of the program. However, in my prior post, I do agree that there was a significant culture shift in the program. That certainly impacted things in those early years. Is that enough to omit those years? I don't know.

However, if you read the post about bootstrapping the statistics to empirically ascertain error - when resampling the data-set ... those early years will, most likely, be sampled less than seasons where the Hawks do well. The point being that even without curating the data before calculating statistical quantities - you can still arrive at similar results. Through my long history of posting on Iowa football boards - I myself similarly curate the data in my mind OR explicitly annotate the data as outlier data (explained away due to the transition of power and turnover on the roster). However, I'm not certain that it is necessarily to do so. Furthermore, we conceptually curated the data ... however, Iowa's early performances were also part of an upward trend that started in 1999 and culminated in the 2002 season. The fact that are such strong correlations in that trend - I'm not certain that it would be fair to prune that data.

I agree. The first two years should be removed from the data, but kept aside for other data analysis. You cannot change inheritance, but you can control what it eventually becomes.
 
Good call. I was 5 when the transition happened and dont remember the name but his bio has a gap right at 1998-2000 and then he coached at state? Wonder what the fanbase was saying during that time.
I was out of town. But the Hawk fans I knew were pissed that Bobby Stoops didn’t get the job. Elliott passed away a couple years ago after a long battle with cancer. Great guy.
 
I was out of town. But the Hawk fans I knew were pissed that Bobby Stoops didn’t get the job. Elliott passed away a couple years ago after a long battle with cancer. Great guy.

I know it's a pipe dream but I hope in about 3-4 years after he has a chance to relax for awhile, the U of I throws the Checkbook at Bob Stoops to come be the AD. Could you imagine "big game" associated with Iowa.
 
Last edited:
Thank goodness we are all finally agreed on 8-5.

We didn't you are just not understanding the math. The closest thing to that we can up, with a pretty unique test I might add, was with a standard deviation of .05 and 30% error when KF plays 13 games he gets to 8 wins. That's alot of error and a fairly wide SD. It's basically the equivalent of saying he wins 7 and is pretty good (.500) in bowl games. The dude is a 7-5 regular season coach. Or as Homer put it "duh".
 
Last edited:
Like I said any time you want to put your numbers out for people to critique you are more than welcome, right now your just a guy who likes to talk.

I'm not hiding anything. Go through and re read the thread if you need to. I've explained my stance quite thoroughly.

I like to talk? You are the one with 3000 posts since september....
I know I'm definitely not going to out talk you, that's for sure....
Haw did a hater like you get to be the board spokesman anyway?
 
I'm not hiding anything. Go through and re read the thread if you need to. I've explained my stance quite thoroughly.

I like to talk? You are the one with 3000 posts since september....
I know I'm definitely not going to out talk you, that's for sure....
Haw did a hater like you get to be the board spokesman anyway?

You love labels. I'm not a hater, its data. Arnt you the dude that created a thread trying to call me out and got told to F off?
 
ADVERTISEMENT
ADVERTISEMENT