Ohio Meets the 2014 World Cup

Growing up in Northeast Ohio, I do not recall ever seeing, let alone actually kicking, a soccer ball.  In those times and in that place the term “football” meant something entirely different.  It meant an oblong, leather-clad, brown inflated ball.  It meant glorious Friday nights at Mollenkopf Stadium.  It meant watching the Ohio State Buckeyes stomp on the University of the Sisters of the Poor every Saturday afternoon.  And it meant exploring new and exciting ways to express one’s displeasure and disgust at the Cleveland Browns every Sunday.  So naturally I wondered how the 2014 FIFA World Cup was playing in this nether world of chauvinistic American sport.

Through the magic of the Twitter API, R code, and a few extra moments of time on my hands, I set forth on the journey to find out.

The R scripts for this little project can be found here.  https://github.com/dino-fire/worldcup

The Twitter REST API enables users to set a geographic parameter to limit searches to a specific geographical area.  The search terms were limited to #WorldCup, #worldcup2014, or #Brazil.  These terms were subsequently eliminated from the analyses, because we’re interested in what people are saying about those terms, not about counts of the terms themselves.

I started with the latitude and longitude of Columbus, Ohio, and specified a 200-mile radius.  A word cloud, of course, yieldImages larger, more prominent displays of words with higher frequencies.  The basic word cloud of Ohioans’ tweets demonstrate some interest in the Spanish and Croatian futbol teams.  Speaker of the House John Boehner garnered a few honorable mentions as well.  What that has to do with the World Cup, I do not know.


Next I made a little side excursion that explored the tweets from the Youngstown/Warren area with those of residents of Youngstown’s sister city, Salerno, Italy.  The outcomes were predictable but nuanced. The Youngstown and Warren folks tweeted about the generic USA.  Could’ve been the soccer team, could’ve been native cuisine, like hot dogs, and could’ve been anything.  Not so with the Italians, though; the national football club was front and center.


Ohioans are people of few words, at least as far as tweeting about the World Cup is concerned.  The vast majority of Ohioans’ tweets comprised 8 or 10 unique words.  The base R program provides a nice histogram.


Before we get into the deeper statistical analysis, I should point out that THE BIG BUZZ at the time was about England getting unceremoniously booted from the tournament in the opening round.

What’s the difference between England and a teabag? The teabag stays in the Cup longer.

A hierarchical cluster analysis of Ohioans’ tweets is intended to depict how words tend to cluster together in Euclidean space.  It’s a fancy way of seeing how words correlate.  And here are the results.


One group of tweets centered on England’s demise, and another seemed to be about who was showing up in Rio de Janeiro.  Yet another group of words dealt with the Italy – Costa Rica match, while a fourth cluster seemed to inquire about who was supporting US soccer.

Disregarding the clustering of words, we can review the correlations themselves.  I’m proud to say that Ohioans are expert analysts of English soccer.


Despite a seemingly infinite number of startups claiming to do better social media mining better than anyone else, sentiment analysis is an iffy proposition at best.  For those who aren’t blessed with 50 unsolicited emails a day from social media mining companies, sentiment analysis refers to an evaluation of a tweet from a subjective, qualitative standpoint.  The analysis tries to classify tweets or other textual content “scraped” from various websites into “good” or “bad,”  “happy” or “sad,” or other such bipolar sentiments.  But often that’s where the problem arises.  For example, the following tweet would be classified as “good:”

Well, England, that was a good effort.

But unfortunately, so would this one:

Well, England, THAT was a good effort.

He or she whom invents a sentiment algorithm that can accurately interpret sarcasm wins the prize.  Yeah, THAT will happen.  Scrape THAT, you bums.

Nevertheless, I’ll hop upon the sentiment analysis e bandwagon and see how Ohioans feel about the World Cup so far. First of all, we see that there is no transformation of the sentiment-scored data required.  The results reflect a very normal distribution, not skewing one way or another too badly.


We see that the sentiment scores are more positive than not, but as of this writing, the USA team is 1 – 0.  Those scores are subject to shift later, to be sure.


In this case, the sentiment scoring algorithm freely admits that it is clueless about the context of many of the words it encountered.  Still, it seemed predisposed to find and tag joyful comments.


The sentiment scoring algorithm output a nice comparison word cloud, which visually demonstrates the words and their respective classifications based on frequency.  Yes, I always associate the term “snapshot” with “disgust.”  Interestingly, “Redskins” got lumped into that classification as well.


So are Ohioan’s beliefs about the World Cup different from other, surrounding, and, some would believe, inferior types of people (based on their state of residence)?  Well, let’s see.

ImageSentiment scores in Ohio, Michigan, West Virginia, Pennsylvania, and Indiana lean uniformly positive.  But a careful look at the boxplots show that Ohioans and Indianans opinions tend to cluster in the middle:  not too positive, and not too negative.  That’s not the case among Michiganders, who tend to be extremely more positive or extremely more negative.  Those Michigan folks represent very nicely the dangerous reality about averages: You can be standing with your feet in a bucket of ice water and your head in a roasting hot oven.  But on average you feel just fine.

A comparison cloud shows just how different the tweets from these separate states really are.  Michiganders seem obsessed with the Italy – Costa Rica match.  Indianans seem strangely interested in the Forza Italia political movement.  Pennsylvanians are engrossed in a game of “where’s Ronaldo?”  Ohioans are losing interest, and starting to turn their attention toward Wimbledon.  And West Virginians don’t seem to care much about the World Cup at all.


The “Class” of 2007

The neighborhood around Cooperstown, New York was upgraded considerably in 2007 when two particular men, Tony Gwynn and Cal Ripken, were enshrined into the Baseball Hall of Fame.  To say that these guys represented the “Class” of 2007 is true in any number of dimensions.

I moved to the Baltimore/Washington area in 1983, which is right about the time both Ripken and Gwynn came on the MLB scene.  Ripken of course, soon became a minor deity in Baltimore.   Growing up in the Midwest and later living on the east coast, I did not much follow the San Diego Padres.  They might as well have been the Pluto Padres as far as I was concerned.   What I did know about them, however, was codified in the persona of Tony Gwynn.

No tribute to Tony Gwynn is complete without that hilarious commercial he and Bip Roberts did for MLB about twenty years ago.  Bip mistook the value of Robin Roberts’ rookie card for his own, until Gwynn corrected him.  As a baseball card collector, I could not stop laughing.  Besides, I had a Bip Roberts rookie card, but not a 1949 Bowman Robin Roberts one, unfortunately.

Watch, remember, and enjoy.  RIP Tony.

How Popular Are My Facebook Friends?

I knew I’d get your attention.  More precisely, how popular are my Facebook friends’ names?

A recent blog by Allison McCann about the popularity of baby names got me thinking.  How popular was my name in 1960, the year I was born?  Well, “Dino” was ranked #404 that year, which, coincidentally (or maybe not) was the absolute high water mark for that particular name in the past 100 years.

Many, if not most, of my Facebook friends were born in the same year.  While I don’t have all that many Facebook friends, those that were there all served as a rich source of data about the popularity of our names in 1960.  I took the liberty of combining variatuions of name where it made sense to do so.  For example, “Susan” reflects the combination of Sue and Susan.  But “Marcella” reflects Marcella only, because she’d kill me if I referred to her as “Marci.”

And here are the results.  Where do you rank?


Among the girls, Susan, Linda, and Lisa dominated that year, and have remained pretty popular ever since.  Roxann had many variations.  There wasn’t a Kym, so the data for Kim will have to suffice.


David, Robert, and William, the usual suspects, were most the popular names among boys in 1960.  Sorry Arthur, but “Scooter” didn’t make the list.  And I find it somewhat interesting that the two lollygaggers in the bunch–me and my friend Geoff–happened to be born on the exact same day.  Coincidence? You decide.

You can kill off the better part of an afternoon paying around with this data.  Thanks to the Social Security Administration for spending our retirement dollars on such fun data mining applications.  Knock yourself out at http://www.ssa.gov/OACT/babynames.

Stated vs. Derived Importance in Key Drivers Analysis

Back by popular demand…derived importance.

A great deal of research is designed to measure the relative impact of specific features of products or services on customers’ satisfaction with those products or services.

Sometimes, surveys are designed to measure importance of those features explicitly and in isolation—no further analysis is necessary than an understanding of which features are more important to customers than others.

In other cases, the importance metrics will be used to determine what, if anything, could or should be changed to improve the product. That’s where key drivers analysis comes in, but more about that later.

Measuring importance through traditional Likert scales, while certainly frequently done, is not the method FGI recommends to measure importance. There are 2 fundamental reasons for this.

First, importance scales often do not provide adequate discrimination and differentiation between product features, especially when viewed in aggregate.

Q: How important is price?  
A: Oh, that’s very important.

Q: How important is product availability? 
A: Oh, that’s very important.

Q: How important are helpful store employees?  
A: Oh, that’s very important too.

Second, people use scales differently (and this problem is not limited to importance scales). Respondents tend to calibrate their responses to previous scores. For example, here’s Respondent #1, rating the 3 attributes in our survey.

Q: How important is price? 
A: Let’s give it a 9.

Q: Now, how important is product availability? 
A: Well, not as important as price, so let’s say 8.

Q: How important are helpful store employees?
A: Less important than price, but more important than availability. 8 for that one too.

But Respondent #2 may follow precisely the same response pattern—9 / 8 / 8—but start their ratings at 6 instead, yielding 6 / 5 / 5. Should we view these three features as more important for Respondent #1 than for Respondent #2?  No. Do any of Respondent #2’s answers qualify for top-2 box summaries?

No. One’s person’s 9 rating may be another person’s 6 rating. The very nature of scales—that the values are relative, not absolute—can cause misinterpretation of the results.

There are occasions where stated importance is appropriate and useful. If this is the case, there are far better ways than Likert scales to measure it, but that’s a subject for another day.

Measuring derived importance

Key drivers analysis yields importance in a derived manner, by measuring the relative impact of product features on critical performance metrics like overall satisfaction, likelihood to purchase again, likelihood to recommend, or some combination of those. The structure of a key drivers questionnaire looks like this:

Q. This next question is about your satisfaction with XX in general. Please rate the store on how satisfied you are with them overall. 10 means you are “Completely Satisfied” and 0 means you were “Not At All Satisfied.”

This question is treated as the dependent variable for our analysis.

Q. Now, consider these specific statements. Using the same scale, how satisfied are you with XX on…

  • Variety of products and services
  • Professional appearance of staff
  • Length of wait time
  • Ease of finding things in store
  • Length of transaction time
  • Convenient parking
  • Convenient store location
  • Price

We can then do some analysis to determine to what extent each of these independent—aka predictor—variables influence overall satisfaction. This is done through something called Pearson’s R Correlations.

In correlations, we get a statistic called R^2 (R squared), which is a measure of the strength of the score of one item to another. In the case of Pearson R, 1.0 means a perfect, positive correlation and -1.0 reflects a perfect, negative correlation. An R^2 value of 0.0 means no correlation at all.

In a key drivers analysis, the higher the correlation between each of the specific attributes and overall satisfaction, the more influence that attribute has on satisfaction, thus the more important it is. Notice that we never have to ask the question “how important is…” since the derived importance tells us everything we need to know. But that’s only half of the equation.

As a result of the question structure, we get explicit satisfaction metrics on each of the individual attributes as well. This data tells us how well we perform on each of the attributes. The resulting output looks something like this:


In our example, “helpful staff,” “coupon policy,” and “items in stock” are the most important attributes; they have the highest correlations to overall satisfaction.

Now compare those attributes to “store location.”  The correlation is still positive, but not nearly as powerful as the first two examples. Remember, derived importance measures importance of individual attributes in relative, not absolute, terms.

The second part of our analysis shows that our store’s employees are helpful. In fact, it’s the highest performing attribute of all (while importance is viewed on the X, or horizontal, axis, performance is viewed on the Y, or vertical, axis).

This means that our store does well on this important attribute, and is considered a core strength. This is not the case with the other important attribute, like having items in stock, however. Our store gets the lowest performance rating on that very important feature.

From our survey results, management can quickly see that resources should be directed toward reducing wait times (more cashiers), improving their coupon policy if they can, and especially keeping popular items in stock.

We’ve precisely identified the few items that need to be prioritized, as improvement in satisfaction with these things will have a direct and measurable impact on overall satisfaction.

What’s in a Name? MLB All-Star Analysis Part 1

The MLB All-Star game is coming up soon, so I thought I’d toss a few random analyses your way to commemorate the occasion.  Here’s one…


So you want to be an All-Star, do ya?  Then change your name to Rodriguez or Robinson.  Here are the surnames of the top 250 All-Stars, by number of All-Star game selections, going back to the dawn of the All-Star game, in 1933 Chicago.  Unfortunately, notable baseball fan Al Capone was probably not in attendance, since he had other commitments at the time in the Big House.  But I digress.  The bigger and bolder the name, the more someone with that name appeared in an All-Star uniform.   This fine graphic represents the intersection of baseball and big data. For example, Robinson refers to the Orioles’ immortal third baseman, Brooks Robinson (18 career All-Star games), Frank Robinson, player-manager for my beloved Tribe despite those gawd-awful red uniforms (14 career selections), Eddie Robinson, who represented the White Sox and Twins in 4 contests, and of course Jackie Robinson, with 6 games as a Brooklyn Dodger. Frank Robinson was an All-Star selection for 3 of the 4 teams he played for in his career–Cincinnati, Baltimore, and LA. He never made it to the All-Star game as an Indian. Of course. All of the Robinsons on this list are in Cooperstown. Rodriguez is attached to Alex, Ivan, Ellie, Francisco, and Henry.

The word cloud was created using the R wordcloud, tm, and rColorBrewer packages.  The simple R script and data file can be found at https://github.com/dino-fire/allstar-analysis.

Like all of this and my upcoming All-Star analysis, a huge shout-out goes to the data geniuses at Baseball Reference.  More baseball statistics than are fit for human consumption.   This blog has been cross-posted to the most excellent R-bloggers site as well.

An Offer You Can’t Refuse

I have a proposition for you.  It’s like one of my heroes said, it’s an offer you can’t refuse.

You stand next to a 17-inch wide rubber platter, holding a 36-inch long cylinder of ash or maple in your hands.  Another, much more athletic individual stands precisely 60 feet and 6 inches away from you, and throws a baseball in your general direction as hard as he can.  When it crosses that 17” plate, the baseball will be traveling between 85 and 100 miles per hour.  The ball may or may not hit you.  You will almost certainly not hit the ball.  (A major league hitter can connect with it one time out of three if he’s very good; what makes you think you can do better than that?) Don’t worry, it will be over in about 2.3 seconds.

For your troubles, I will now award you $11,628.  Before taxes, of course.  What a deal!  You get 11-large to stand there and get a baseball thrown at you. Once.  And for every additional time you stand there and have that ball thrown at you, I’ll give you another $11,628.  How long would you stand there?  How many pitches would you confront for that money?Image

Well, if your name happens to be Miguel Cabrera, and my name happens to be the Detroit Tigers, you will come to the plate 675 times, and stand there look
ing at 2,500 pitches between now and October. At $11,628 per pitch, I will give you $29 million dollars. And to sweeten the deal, we will do this, you and I, for the next 10 years. Deal?

You’d smile too.

This is not a fantasy, except for the part that you are not the one collecting that cash, and I am not the one doling it out.  This is simply one way to view the $292,000,000, 10-year contract the Tigers “inflicted” on Miguel Cabrera this week.  Here are a few other fun ways to look at this princely sum.

  • A typical ball game lasts 3 hours.  Miggy shows up to work: he has played in an average of 157 of the 162 games in a season since he came to the American League from the Miami (nee Florida) Marlins in 2008.  So he makes $61,571.13 per hour.  The federal minimum wage is $7.25 per hour.  You, on the other hand, would have to work 8,500 hours to earn what Miggy does in a single hour.  That’s 4 years, chum.
  • Cabrera is a third-baseman, mostly.  He averages having 1,026 balls hit at him every year.  Do the math: that’s $28,265 per fielding chance.  He also makes 8 errors per year.  He would probably refund the Tigers the $226,000 he got for those particular misplayed chances, but the union contract won’t let him.  So you places yer bets and you takes yer chances.
  • The Detroit Tigers drew 3,083,397 fans to Comerica Park in 2013 to see their Central Division winning team.  And, Cabrera’s 2014 salary of $29,000,000 represents an $8,000,000 raise from 2013.  So each fan needs to pony up another $2.59 to cover the additional labor cost.  Put another way, the Tigers need to sell a million more $8.00 beers. Shouldn’t be too hard when you think of it that way.

“Professional baseball is on the wane. Salaries must come down or the interest
of the 
public must be increased in some way. If one or the other does not
happen, bankruptcy stares every team in the face.”  

— Chicago White Stockings owner Albert Spalding, 1881. 

The more things change, the more they say the same.

 Sources: www.baseball-reference.com, www.SI.com, www.sportingcharts.com

Today’s Class: CP101

I got a Fitbit device a few weeks ago (stop laughing).  If you’re not familiar, these things keep track of your exercise (such that it is), and help you stay motivated to get in shape.  It does so by tracking the number of steps you take each day, among other things.

One thing it does not do is track your heart rate. Which is a good thing, because it would have exploded last night in the 9th inning of the Indians game against Kansas City. Chris Perez, we all have to die of something, and my demise will be because of YOU.

Photo courtesy of Winslow Townsend

A too-familiar CP pose.

Perez has the lofty vocation of “closer,” meaning he has one job–admittedly a difficult one–but it is one job.  Get three outs,  Specifically, the last 3 outs of a ball game, but only those games in which your team is already winning.  He did so successfully 39 times last year, but has only 23 saves thus far in 2013.  Interestingly, he actually has 5 wins this year, and 3 losses (he was 0-4 in 2012).  How does a closer acquire 5 wins, you ask?  By blowing saves, that’s how, at which point the Indians’ offense comes in and bails you out in extra innings.  It’s not something to be proud of, CP.  These are the facts, and they are not in dispute.

Perez blew 4 saves in 2012, and has blown 5 thus far in 2013.  Not a big difference statistically, but boy did they seem to hurt this year.  Most painful was the 6-5 loss to Boston at the end of May, the same game where he added injury to insult.  Perez gave up a three-run 9th-inning lead in that game.  Suspecting an injury, he was asked to throw a warm-up pitch, which he promptly launched into the general area where Kevin Costner and James Earl Jones sat in the Fenway Park scene in Field of Dreams.  I couldn’t help but think it was intentional: “See, look how hurt I am!  It wasn’t my fault!”

And what does Mr. Perez have to say for himself about that Boston game, last night’s Royals game, or any other game, for that matter?  Nothing.  Nada.  Silencio.  You see, CP has declared that he would not be talking to the media whatsoever this year, because “it is too much of a distraction to the team.”  How convenient.  Why did you walk a number 9 hitter batting 0.167 on four straight pitches? Have the consequences of having a bunch of weed shipped to your dog affected your performance? Or is it due to the substance itself? Crickets.

Talking to the media–and vicariously, to the fans that make your mediocrity possible–won’t necessarily make you Mariano Rivera overnight. But hiding behind a “no talk” clause is a poor way to absolve yourself of accountability. In case you haven’t noticed, CP, some people think the Indians actually have a chance at some October baseball this year. Last year, CP castigated fans for not coming out to Progressive Field in larger numbers, This year, you might be well advised to answer that question, at least in part, by looking in a mirror.


When one data word equals a thousand words

Being a certified propellerhead comes with certain privileges, and few are more important than the unalienable right to have other propellerhead geeks as role models. One of mine is a Yale professor by the name of Edward Tufte.

Sparklines are tiny little graphs embedded in textual analysis.  Dr. Tufte, the widely recognized guru of graphics for data-centric reporting, invented these little beasts.  He refers to them as “data-intense, word-sized graphics,” also known as “data words.”  They are useful in describing how linear or time-series data changes Imageover time, or how one group Imagestands out from the rest.  Sparklines are easily added to Excel spreadsheets through the “insert” ribbon, although they don’t copy and paste very well into PowerPoint or Word documents.  The high-resolution ones you see in this article were generated by the handy sparkTable R package (Kowarik,  Meindl, and Templ, 2012).

As an interesting and pathetic side note, Microsoft has applied for a patent for their implementation of sparkline functionality in their software, which is particularly galling to the spirit of freely-available open source applications if not downright plagiarism couched in tech-giant legalese.  A Google search on “sparklines” turns up an onslaught of search-engine-optimized content about how to use Excel to make sparklines (a good thing, too, since making sparklines in Excel requires all of the technical expertise of a contemporary third grader).  Go ahead and patent your weak excuse for sparklines, Microsoft.  I guess patents are cheap to come by in the software world.  Where do I sign up?

Taking the Confusion Out of Your Confusion Matrix

Goodness good ¬ ness [goo d-nis] the state or quality of being good, excellence of quality. (dictionary.com).

A good predictive model is only as good as its “goodness.”  And, fortunately, there is a well-established process for measuring the goodness of a logistic model in a way that a non-statistician—read: the senior manager end-users of your model—can understand.

There is, of course, a precise statistical meaning behind words we propeller heads throw around such as “goodness of fit.”  In the case of logistic models, we are looking for a favorable log likelihood result relative to that of the null, or empty, model.  This description isn’t likely to mean much to an end-user audience (it barely means anything to many propeller heads).

Saying that your model will predict the correct binary outcome of something 81% of the time, however, makes a lot more intuitive sense to everyone involved.

It starts with a standard hold-out sample process, where the model is trained and modified using a random part—say, half—of the available data (the learning set) until a satisfactory result is apparent.  The model is then tested on the second half of the data (the testing set) to see how “predictive” it is.  For a logistic model, a “confusion matrix” is a very clean way to see how well your model does.

Using existing historical data, say we’re trying to predict whether someone will renew their association membership when their current contract is up.  We run the model on the testing set, using the parameters determined in the initial model-building step we did on the learning set.

logit.estimate <- predict.glm(fit, newdata = testing, type = ‘response’)

Let’s set the playing field by determining what the existing “actual” proportions of the possible outcomes are in the testing data.

# Actual churn values from testing set
testprops <- table(testing$status)  # create an object of drop/renew (actuals)
prop.table(testprops)  # express drop/renew in proportional terms

Drop   Renew
0.59   0.41

So historically, we see that 59% of people don’t renew when their membership period is up.  Houston, we have a problem!  Good thing this is a hypothetical example.

The elegance of logistic regression—like other modeling methods—is that it provides a neat little probability statistic for each person in the database.  We can pick some arbitrary value for this predicted probability—say anything greater than 50% —to indicate that someone will renew their membership when the time comes.

testing$pred.val <- 0  # Initialize the variable
testing$pred.val[logit.estimate > 0.5] <- 1 # Anyone with a pred. prob.> 50% will renew

With those results in hand, we need to know 2 things.  First, how well does the model do in pure proportional terms?  In other words, it is close to the same drop/renew proportions from the actual data?  This is knowable from a simple table.

testpreds <- table(testing$pred.val) # create an object of drop/renew (predicted)
prop.table(testpreds) # express drop/renew predictions in proportional terms

Drop   Renew
0.60   0.40

Recall that our original proportions from the “actuals” were 59%/41%…so far so good.

Second, and most importantly, how well does the model predict the same people to drop among those who actually dropped, and how does it do predicting the same people to renew among those who actually renewed?  That’s where the confusion matrix comes in.


In a perfect (but suspicious) model, cells A and D would be 100%.  In other words, everyone who dropped will have been predicted to drop, and everyone who renewed will have been predicted to renew.  In our example, the confusion matrix looks like this:

# Confusion matrix

confusion.matrix <- table(testing$q402.t2b, testing$pred.val) # create the confusion matrix 
confusion.matrix # view it

         Drop   Renew 
  Drop   310       55
  Renew   62      189

Assign each of the four confusion matrix cells a letter indicator, and run the statistics to see how well the model predicts renewals and drops.

a <- confusion.matrix[2,2]  # actual renew, predicted renew
b <- confusion.matrix[2,1]  # actual renew, predicted drop
c <- confusion.matrix[1,2]  # actual drop, predicted renew
d <- confusion.matrix[1,1]  # actual drop, predicted drop
n = a + b + c + d  # total sample size

CCC <- (a + d)/n  # cases correctly classified
[1] 0.81

CMC <- (b + c)/n # cases misclassified
[1] 0.19

CCP <- a/(a + b) # correct classification of positives (actual à predicted renew)
[1] 0.75

CCN <- d/(c + d) # correct classification of negatives (actual à predicted drop)
[1] 0.85OddsRatio <- (a * d) / (b * c) # the odds that the model will classify a case correctly
[1] 17

At 81%, our model does a pretty fair job of correctly determining the proportion of members who will drop and renew.  It is capable of predicting the individuals who will renew their membership 75% of the time.  More importantly, the model will predict who will not renew 85% of the time…presumably giving us time to entice these specific individuals with a special offer, or send them a communication designed to address the particular reasons that contribute to their likelihood to drop their membership (we learn this in the model itself).  If we send this communication or special offer to everyone the model predicts will drop their membership, we will only have wasted (aka “spilled”) this treatment on 15% of them.

Now that’s information our managers can use.

1986 Topps Baseball

In the expansive world of collectible baseball cards, 1986 Topps Baseball comes cheap. In the base set, there are no classic rookie cards worth extorting people over. Barry Bonds’ rookie card came in the 1986 Topps Traded & Rookies set, which is not at all part of the base set, as it is a supplement released after the season. I bet you didn’t know that. That Bonds card used to be valuable, prior to the ‘roid rage era.

It’s been about 12 years now, but Tim—my stepson and partner in baseball card overspending crime—and I came across the opportunity to grab a vending box case of those cards for $75.  Vending boxes are literally that…in those days, distributors would go around stuffing baseball card vending machines with these.  That case held 15,000 cards, if I recollect. 15,000 essentially worthless cards, stuck in dozens of individual vending boxes containing about 500 each, totally at random.  Cards with a big black banner, a weird all-caps font.  Bad ‘80’s haircuts.  Minuscule statistics on the back.  780-something of the damn things in a set. What to do with them?  For starters, let’s have a collating party. That’s right, sort those bad boys into complete sets.  Tim was unceremoniously pressed into indentured servitude on this one.  I sent a set to my nephew in Texas, who happened to be born in 1986, figuring he might appreciate it someday…a snapshot of the professional baseball scene from the time of his birth.  I wish someone would have given me a set of 1960 Topps Baseball back in the day, but if wishes were fishes we’d all cast nets, as the saying goes. So I had reduced my extensive 1986 Topps Baseball holdings down to 14,220.  We made another complete set, and undertook a mission: get them signed by each of the 780+ players.  All of them.  Well, at least the ones who were 1) still alive, 2) able to write their name legibly in cursive, and C) willing to do it for the princely sum of free. This little mission went on for many years, in fits and starts.  We were able to accumulate a couple hundred of those autographs.  Some of the highlights of this journey:

  • Pete Rose wanted something like $50 to sign his card.  For that price, I’d rather have had him sign a betting slip from Caesar’s Palace.  I passed.
  • Cecil Fielder—papa to Prince—was the first one to send his autographed card back.  He wins the prize.
  • Cecil Cooper (another Cecil) from the Brewers wins the “You Are Now Forever Cool” award, as he signed the card to “Dino” personally.
  • At one point, a fellow collector who knew about my quest said he was planning to attend a game in which the minor league Winston-Salem Warthogs (look it up) were a contestant.  The man with the all-time coolest name in the history of major league baseball—who was the manager of the Warthogs—signed his card in person.  My connection said that the Warthog players witnessing this signing event could not stop laughing at the player’s hairdo on the card.  That manager was Razor Shines.

Razor  Shines Anybody want some 1986 Topps Baseball Cards?  Let’s make a deal!  Only a few thousand left…