Data Science for Cricket
Authored by Raghunathan Rengaswamy, IIT Madras
Data Science, ML, AI are
the flavors of the season all over the world and India is no exception. Interest in these techniques is at a fever
pitch; academicians and industrial practitioners alike are exploring the
fundamental underpinnings and application potential of these techniques. As a result, data science is being evaluated
for implementation in all imaginable fields and sub-fields. Sports is one such area with tremendous
potential for data science applications.
While sports analytics has been around for a long time, sabermetrics
being a prime example, the scale of what can be achieved now with analytics is
quite stupendous. Disparate data from
various sources can be collected and analyzed towards making decisions
regarding almost all aspects of any sport.
Take cricket, which is the focus of this blog, as an example. Ball-by-ball commentary, video feed, bats
that are actual IoT devices producing fascinating information, can all be merged
together for analysis. This blog describes
a data science tool for cricket that was created in India by ESPNcricinfo, IIT
Madras, and Gyan Data Private Limited (GDPL).
While one would usually associate sport analytics with tools that can be
used to improve player performance or game outcomes, this effort targeted
enriching spectator experience. The aim was to go beyond traditional statistics
by bringing in multiple layers of nuanced analysis that heightens the
sport-lovers’ engagement with the game.
This particular effort was
a result of ESPNcricinfo’s interest in rationally addressing two talking-points
that seize spectators the most. The
first one is related to the impact of luck on the result of a game. The second is
on deciding how important a particular performance is to the outcome of a game
or equivalently, quantification of an inherent value of a performance that
takes into account as much game context as possible. Sport lovers would surely have participated
in arguments centered on these two themes.
At the end of these discussions - usually highly energetic and
contentious - one is more often than not left with a sense of dissatisfaction
or non-closure. ESPNcricinfo wanted to arm
the debaters with a little more data-based reasoning to bolster their arguments.
In other words, the effort was to mathematically frame these questions, albeit not
from a viewpoint of definitive answers (which obviously is not possible), but
from a spectator engagement perspective.
At this juncture, many of you reading the blog may probably chuckle at
the foolhardiness of such an exercise. You might wonder how one would be able
to quantify a fundamentally abstract concept such as luck, especially in a
debate where the participants have pre-conceived notions and favorite teams. Let us pause here for a moment to enunciate
the fundamental guiding principle that underlies this work. Clearly, there is no unique best approach to address
this problem. Rather, it is desirable
that any approach that one develops satisfy the following two
requirements. One, consistent
application of the same approach to all scenarios leading to “apple-to-apple”
comparisons should be possible and two, it should make “cricketing sense”. At this point, there might be dismay that we
are trading one abstraction, “quantify luck” for another, “cricketing
sense”. If one were to think a little more
carefully, it will become apparent that this is not as arbitrary a concept as
may seem at first encounter. Assume that
you make a statement about some cricketing situation to a room full of
spectators. If a large majority of the
room agree to the statement, then the statement is deemed to make cricketing
sense. Of course, how one practically realizes
this (we cannot assemble a room full of spectators and do polling) is a tricky
question and in our case, a panel of ESPNcricinfo experts painstakingly checked
if the algorithm results make cricketing sense for the large number of games
that we tested on. Essentially, the
underlying idea is some form of “majority is right”, however small the sample
may be. We apply this principle to considerably
more important things in life than cricket.
Any result of a presidential style election, however important the
country be on the world stage, cannot be viewed as the right answer (in any
mathematical or practical sense) but as an answer that is delivered by a
majority (in some cases, not even that).
Let us return to the first
problem at hand. Given a score-card of a
game that is completed, how can we quantify the impact of luck on the game? In rather short order one would realize that
just a scorecard is not enough information.
In this project we decided to work with information at a higher level of
granularity, which is the easily available, “ball-by-ball” commentary. Ball-by-Ball commentary is, in general, unstructured
data; however, some structuring of this data and a resultant database was
already available with ESPNcricinfo. When the problem of quantifying luck is
broken down further, there were several more questions that we had to contend
with. A list of these are:
1.
What are luck events?
2.
Do these events affect the batsman and bowler in the same manner? In
essence, is it zero-sum?
3.
Is the impact of a luck event on the batting or bowling team the same as
the impact on batsman or bowler?
4.
How would one quantify the impact of disparate luck events in an
apple-to-apple fashion anyways?
5. What is the cumulative impact of all the luck events on the two teams?
How does one account for the luck event in both the innings together?
Of course, at this point,
answering all these questions looks like a formidable endeavor, and a
comprehensive solution might be elusive.
In the search for a data science approach, the first decision that was
made was to enumerate a reasonably comprehensive list of luck events. A list of such luck events is shown in the
figure above.
It can be seen that we have
a hierarchical arrangement of the luck events. At the highest level of
hierarchy, we have dismissal and non-dismissal events. Dismissal events are
further categorized into replacement and reinstatement events. There are multiple luck events under each of
these final nodes of this classification. We report an illustrative list here
and in the actual application many more events have been considered. The non-dismissal events have a reasonably
simple logic for their run impact computations. Replacement events are ones
where the alternate situation is where a batsman has to be replaced by another
one. In contrast, reinstatement events
are ones where a batsman has been given out unluckily and one has to imagine
what would happen to the scorecard if the batsman was reinstated. This hierarchical arrangement and the
identified luck events answer our first question in the enumerated list of five
questions.
The table above is developed
for all the luck events (the table shows a subset of events). This table allows
us to answer questions 2 and 3. For each of the luck events, how (with the
correct interpretation of positive or negative luck) and if it impacts the
batsman, bowler and the respective teams is described in these tables (only one
of the multiple tables shown here). Y stands for impact and N stands for no
impact in luck computations. Using these tables, we also address differences
between luck and skill to some extent. If one looks at event description Catch
dropped, for a regulation catch that is dropped, the bowling team is not
unlucky but rather they have not executed a basic skill properly (N entry). Now
armed with this formalism, one can then proceed to answer question 4, which is
the identification of a quantifying metric for these luck events that will make
commensurate comparisons possible. The
most obvious quantifying metric is the run impact of the luck event. This would allow luck events to be compared
on an equal footing. This necessitated
the development of a core data science module that can predict future runs that
will be scored from any given situation in a game. This was named the
forecaster.
The basic mathematical
problem is, given a score at the end of nth over (runs scored and
wickets fallen), how does one predict the score at (n+k)th
over? Initially, since it looks like a
nice time series problem, we used a recurrent neural network architecture for
this prediction. However, there were
difficulties with this approach, largely related to data requirements and
explainability. We could also not explore
this solution fully given the incredibly short time that we had (3 months),
starting from a blank page all the way to a deployed application in the ESPNcricinfo
website. It would be interesting to
revisit this with more data and deeper (figuratively and literally)
architectures. Nevertheless, we abandoned this approach and moved onto a more
operations research approach, with machine learning models as required. Here, from a given situation, there are a
certain number of balls remaining to be bowled (resource) and these need to be
allocated to the remaining batsman (allocation). We solve this resource allocation problem
based on multiple statistical parameters derived from the data. Once this problem is solved, for predicting the
score after (n+k)th over, we need to predict the strike rates of the
batsmen who will play-out the allocated balls.
Here, we use different machine learning models with self-correction
abilities trained on data for all the batsmen in the database. These models take
several factors into account, and are also conceptually extendable to include
other factors in the future. From our experience, the most accurate machine
learning model to be used depends on the format of the game (T20, ODI). This
module for prediction can then be integrated to predict the impact of luck
event. The score prediction algorithm is run on the actual situation and luck
removed alternate situation. The
difference in the predicted scores quantify the luck impact. Though not used in luck computations,
probability of a result (win or loss) for the teams was also developed based on
the forecaster and historical data. There are also other nuances such as post
game and live game luck computations and so on that are not discussed here, for
reasons of brevity. Further, the computations were carefully designed so that
these impact numbers could be cumulated to address question 5 in the list of
questions.
Now that the luck events are
enumerated, each delivery bowled can be annotated with a luck code. This necessitated that the database be
altered to include as many columns in the table as there are luck events. As a commentator is providing text commentary,
he or she will also score the presence or absence of luck events for each of
the deliveries. The default value is
zero, which signifies absence of the luck event; this ensures that online
scoring of luck events is simple and efficient.
Traditionally, ESPNcricinfo was not scoring these luck events and hence
considerable effort at retrospectively scoring a selected set of matches for
luck events through manual curation of the commentary had to be undertaken. In
some cases, the original match footage had to be revisited for this annotation
exercise. A set of 50 odd games were annotated
for luck events and then used to benchmark and evaluate the appropriateness of
the algorithms that were developed.
We also developed
algorithms for identifying the inherent value of different performances – a
suite of algorithms collectively called smartstats. Here, the key idea is to
value performances based on a notional pressure felt by a batsman or bowler when
they are performing. Performances in high pressure situations are valued more
than the ones where the pressure is minimal. The pressure that we feel (and
presumably the players also feel similar pressure) while watching the game is
directly related to the scoreboard pressure. To capture this, the difference
between the predicted score and the target is mathematically transformed into a
value for pressure. The first innings
pressure is calculated based on a notional target, akin to the par score that teams
bating first usually target. This instantaneous pressure is used to
appropriately increase or decrease runs scored of every ball. Based on this the algorithm identifies an
alternate score card from which smart strike rates and other smart statistics
can be derived.
We
will now look at some of the results from our suite of algorithms for the IPL 2019
season and the recently concluded ODI world cup. We sample some interesting
results and describe them briefly. One of the first successes of the forecaster
tool in the world cup ODI came in a game between South Africa and Bangladesh.
The forecaster predicted a final score of 335 for Bangladesh after 25 overs and
they went onto make 330 at the end of the innings. This was one of the early
scores greater than 300 predicted by the forecaster.
In another Bangladesh
match featuring West Indies, the forecaster gave a thumbs-up for Bangladesh by
the half-way mark with a win percentage of about 63%. At this point in the
game, Bangladesh had still about 160 runs to score with three top order batsmen
gone. It turned out that the forecaster was right and the game ended in
Bangladesh’s favor. Of course, there are also cases where the forecaster’s
predictions didn’t turn out to be as accurate.
In terms of luck
index, there were several interesting results throughout the IPL season. Here,
we point out a consolidated result in terms of the overall impact of luck as
judged by the algorithms. Below, you will see two tables, one with actual
standings at the end of the league games, where MI, CSK, DC and SRH were the
top four teams and these teams moved onto play-offs. If we were to remove all
luck events from all the games, our algorithms predict that RR would have
replaced CSK and gone onto the play-offs. Whatever you make of this result, one
thing is for sure; this table will not make us popular with the CSK fans (no
lucky guesses needed here!!).
Let us look at some
results from the smartstats algorithms that were developed. We describe two
prototypical results here, one for batsmen and one for bowlers. Let us look at
what smartstats says about the performances of KL Rahul and M Agarwal in a KXIP
v MI match. The pressure was high (required run rate was over 10) when Mayank
came out to bat. Mayank scored 43 off 21 balls and turned the match in Punjab's
favor. During his partnership with Rahul, Mayank scored the bulk of the runs at
a high strike rate and reduced the pressure of the required rate on Rahul and
the other batsmen to follow. Though Rahul scored 28 runs more than Mayank,
Mayank scored more 1 smart run more than Rahul in the innings as judged by the
smartstats algorithms (shown below).
From a bowling
viewpoint, let us look at the performances of Axar Patel and Sandeep Sharma in
a KXIP vs RCB game (IPL 2017). Both Sandeep Sharma and Axar Patel took three
wickets each for Punjab However, while Axar took the wickets of Shane Watson,
Pawan Negi and Samuel Badree, Sandeep was the bowler to derail RCB's chase with
wickets of Chris Gayle, Virat Kohli and de Villiers inside the Powerplay. Sandeep's
three wickets were worth 4.86 on smart wickets. Axar's three were worth 2.85.
One of the fun
aspects of this work has been the feedback of fans who followed IPL and ODI
world cup in the ESPNcricinfo website. Here is a representative collection of
comments from the website. The first
commenter has words of encouragement for the forecaster and one another is
impressed by forecaster’s early precise prediction.
In the comment “has Forecaster seen this Sri
Lanka team even bat” the commenter seems to be skeptical of the forecaster’s
prediction of a big score for Sri Lanka. It turned out that the forecaster’s
prediction in this case was quite accurate in the end. Of course, the comment following
that shows the interest of fans in wanting direct access to the forecaster
tool.
In summary, it was an
incredible experience working at the intersection of data science and cricket,
both of which are exciting domains. Let me end this blog with an answer to an
interesting question that we pondered over when we started to build these
algorithms. At what point in the game will we get the best predictions from our
algorithms? Based on the performance of our algorithms in IPL 2019 and the ODI
world cup matches, we see that for T20 games, the 11th over
predictions seem to be best and for ODI, the 25th over predictions
seem to be the best in terms of accuracy of the final score predicted. As we
can see, this is right about in the middle of the game. This might be so
because predictions towards the end are generally plagued by random errors
(with not enough overs to average them) and predictions at the beginning might
not have enough information about the current game to work with.
About the author -
Raghunathan Rengaswamy is an Institute Chair Professor at the Department of Chemical Engineering and a core member of the Robert Bosch Center for Data Science and AI (RBC-DSAI) at IIT Madras. He is also a co-Founder and Director of Gyan Data Pvt. Ltd. (GDPL), GITAA Pvt. Ltd., and Elicius Energy. He was elected fellow of Indian National Academy of Engineering in 2017.
Very interesting! For a nation full of cricket lovers this is like Mannah from the Heavens - bound to liven up conversations. But more importantly it shows the way to build applications that need to combine knowledge from multiple sources and predict outcomes in a complex situation. Enjoyed it.
ReplyDelete