Indie Author Insights: A Decision Tree for E-Book Sales

Christina Pierre
4 min readJun 15, 2021

--

A decision tree is a method used in machine learning that creates a tree-like model to predict the value of a target variable. Basically, you feed it some data (like a spreadsheet) and it predicts what one of the values will be.

Naturally, I wanted to use this model to look into my e-book sales. I decided to use my data from Google Books to find out how the book-buying process might look as a decision tree. Amazon is by far my biggest market, but they don’t provide any data on traffic. Google offers a surprisingly detailed report that you can download from the partner center. The report contains the following columns.

I don’t have many books on Google Books, but those that are there have been live for years, so there’s a good amount of data — 2,226 rows of it! Each row represents one user’s session looking at my books.

Although Google provided this glossary, I wasn’t sure how these variables were related. And which variable should I predict, BV with Buy Clicks or Non-Unique Buy Clicks? I hopped into SPSS to build a scatterplot matrix.

This may look complicated, but it’s just a group of scatterplots arranged in a table. The x-variable is on the x-axis, and the y-variable is on the y-axis. The cells where a variable intersects with itself are left blank.

A lot of the intersections have an L-shaped pattern, meaning one variable has a wide range while the other is mainly low. Take the bottom left square, for example. The number of pages visited comes in a wide range, while most book visits are low.

The only clear correlation that I can see is “BV with Pages Viewed” and “Pages Viewed,” which makes sense. It’s a weak correlation, but I’ll leave “BV with Pages Viewed” out of my analysis — I’m not sure if these variables overlap each other. “Buy Link CTR” doesn’t seem directly correlated to anything, but it’s a dependent variable and not the cause of any others, so I’ll leave it out as well. “BV with Buy Clicks” seems to behave similarly to “Non-Unique Buy Clicks,” and by double-checking in Excel, I can see they’re equal in almost all the cases. I’ll go with the latter for my analysis.

I created the decision tree using scikit-learn in Python. “Book Visits (BV)” and “Pages Viewed” were my x-variables. “Non-Unique Buy Clicks” was my y-variable. Basically, I wanted the tree to show how many book visits and pages viewed would result in a buy click.

I allowed the decision tree to go to a depth of four, meaning it would show a maximum of three parent-child relationships. I didn’t set any minimum number of observations at the internal nodes or leaf nodes.

The R2 training score is 0.0355 and the R2 testing score is -0.0311. The model doesn’t fit the data perfectly, and the testing score is negative since I didn’t include an intercept. However, what the tree tells us is still worth investigating.

If a user had <= 22 page views and <= 3.5 book visits, and (breaking it down further) <=0.5 pages viewed and <=2.5 book visits, the predicted number of non-unique buy clicks is 0.013. This is the leftmost bottom leaf, and it accounts for 905 of the 2226 sessions. A user with <=0.5 pages viewed and 2.5 book visits would purchase on average 1.3% of the time.

Let’s look at the darkest leaf. (The color of the leaf corresponds to the predicted value.) If the user had <= 25.5 pages viewed and had <=3 book visits, they have a predicted value of 1.0 non-unique buy clicks. But this leaf only contains one sample, so we’re talking about a single session!

The leaf to the right of that one has the second-highest value, but still only two samples. Looking at the one to the left with 82 samples instead, we can find a better insight. If a user had <= 22 page views and >=3.5 book views, <=8.5 book views, and >=6.5 book views, they had a predicted value of 0.073 non-unique buy clicks. So, a user with <= 22 page views and between 6.5 and 8.5 book views would purchase on average 7.3% of the time.

It seems like more page views and book visits tend to lead to a higher number of buy clicks, which makes perfect sense. People generally want to gather more information before buying.

I wonder what this tree would look like if I broke it down by genre… but I’ll stop here for the moment!

--

--

Christina Pierre

Self-publishes e-books. Studies business insights and analytics. Writes about insights for independent authors.