The Data Hiding Inside Ebooks

by Arthur Klepchukov

I recently met with one of the founders of a startup pushing the boundaries of what’s possible in ebooks. We discussed what impact more technology and better insight into reader behavior would have on publishing. Could books be more like web sites or apps in that regard? Would that be a better experience for readers, writers, and publishers?

A powerful and sometimes scary thing about web sites is the data they collect about their users. As a site owner, it helps to know which of my pages attracted the most visitors, where those visitors came from, and how long they stayed. Given this data, I can add more content similar to what’s worked before, avoid what hasn’t, and build an audience by promoting my site on other sites that send me quality traffic. Improving my site is rarely as simple as site A sends me more people than site B so I’ll pander to that audience but regardless, some insight into my audience is better than none.

Now apply these ideas to a blog. If I know what posts attract readers and I write more on similar topics, that data is now helping me as a writer in addition to a web site owner. I get to understand my readers in a whole new way. Sure, this type of feedback isn’t as personal and human as a comment or a conversation but it’s feedback nonetheless. The real question is: why isn’t data like this available for writers beyond their blogs? Most, if not all, e-readers have Internet access. The popular book formats (Kindle’s .mobi, Apple’s .ibooks, and everyone else’s ePub) are built on the core language of the web (HTML). In essence, what’s possible on the web today should be possible in e-readers today.

A valuable piece of web site data is the exit rate for its pages. The exit rate is what percent of people go to a different site from a given page. A page with a high exit rate is one where lots of people drop off. Imagine if ebooks understood exit rates. It could help a novelist see how far readers got in a book. The chapter, section, or page with the highest exit rate would be a great candidate for revision. The writer could see where he or she loses readers with zero effort; they would just read and stop reading whenever they felt like it. If a writer had this data and was willing to act on it, he or she could even update the ebook with revisions and measure success with new readers.

Even basic data could help answer valuable questions about any given ebook for the writer and publisher:

About the free sample:

  • Of the people who downloaded the free sample, how many finished?
    Suggests how successful the sample is.
  • Where did most people stop reading the free sample?
    Suggests if the sample needs work or if a new sample is needed.
  • At which point in the free sample did they buy?
    Suggests how successful the sample is.

About the book as a whole:

  • How many people started? Finished?
    Suggests how successful the story is overall. Can show the author improving when compared with previous work.
  • How many bookmarks, notes, annotations, and shares did readers make?
    Suggests how much impact the story had on the reader.
  • How often did people read?
    Suggests how engaged the readers are.
  • How long did it take to finish?
    Suggests how engaged the readers are.
  • How long was a typical reading session?
    Suggests how engaged the readers are.

About the reading experience:

  • What did they read before this?
    Suggests opportunities for marketing and cross-promotion.
  • What did they read after?
    Suggests opportunities for marketing and cross-promotion.

Some of this data would be pushing at the privacy boundaries of readers so I would make it all aggregated, anonymized, and requiring readers to opt-in.

As a reader, would you be willing to opt-in and passively share this data about your interactions with a book?

As a writer, editor, or publisher, would you use this data as part of your process of editing and gathering feedback?