DYI HTML e-book formatting

I’ve mentioned that there are three main ways to format an e-book. In this post, I’ll share what I know about the hands-on HTML formatting – the most direct and most powerful way but also the potentially most-difficult.

If you want an easy way to format your manuscript for self-publishing on Amazon, Kindle Create provides a decent (even though limited) set of options… that will be enough for most authors. You can look at my review/guide here.

The good and the bad

Now, let’s have a look at the advantages and disadvantages of HTML hand-coding.

The pros

  • Can do almost anything
  • Real-time preview
  • Most HTML/CSS guides on the internet apply
  • Works for any output format
  • Further runs get faster each time

If you know HTML well, you can do small miracles with your e-book formatting. If you know some extended basics, and can do a bit more by following a guide, you can still do some fancy stuff.

The strongest point here is that you can create output for most of the common e-book files, as well as print-ready PDF.

The cons

  • Learning curve (especially from zero)
  • Some fancy stuff takes a lot of work
  • Output format confusion/limitations
  • Requires a few more tools
  • Small edits can take a lot of time (full file re-build necessary)

This is the time to be totally honest. If you’ve never worked with HTML, learning on your debut isn’t the best option. You’ll need a few more pieces of software, you’ll need to understand advanced search-and-replace methods, and you may need several attempts full of trial-and-error to get the look you want. And sometimes, a hidden checkbox or a missing semicolon will give you a sleepless night.

The guide

Now, before I start, a rundown what you’ll need:

  • Any text editor (such as MS Word)
  • A ‘coding’ text editor (I use Notepad++), preferably something that helps pointing out syntax stuff in HTML
  • Calibre for conversion (free)
  • Any web browser
  • A dedicated PDF viewer (optional, most web browser should be able to open PDFs these days)
  • Kindle Previewer (free) or any e-reader (optional, for checking the conversion results)

Note: my process is based on Guido Henkel’s guide, with several personal adjustments. If you want to just follow his detailed guide, it’s there for you. However, the guide is from 2010 and, as you’ll see in this post, some things might’ve changed a bit.

Stage one: manuscript preparation

Okay, the first step will be to write the manuscript and get close to publishing-ready state. The time when you should start working on conversion is late beta, to experiment with the look, and learn the process which you can then just run through for the final file.

For now, you’re okay with just the manuscript – you can do the first few test runs without the front/end matter or cover.

In this step, you may be tempted to just have Word export your file into HTML. Not only there are several options, but many of them contain a ton of ballast in the form of extra code that’ll make your head hurt. I’ve tried.

So, let’s start by the most-important step. Save an extra copy of your manuscript, and back up any files, preferably twice at least. The extra copy is to serve as your sandbox, especially for the first few tries, but you need to do the prep in an extra file. You’ll see soon why.

So, the Word prep. Since HTML strips any formatting, you need to pre-format any bold and italic text before making the HTML file, by wrapping around the respective tags. This caused me nightmares in the beginning, because I could not find the right syntax. This is done by the search-and-replace tool by leaving the ‘search’ field empty and only hitting Ctrl-I (for italics) and Ctrl-B (for bold, respectively) and <i>^&</i> into the ‘replace’ field for italics and <b>^&</b> for bold. This will put the HTML tags around italic and bold text.

Here comes the first trap. Run through your file and make sure that that any case of italicized or bolded text spanning multiple paragraphs (such as letters, longer internal thoughts, etc.) have start and end tags for each separate paragraphs – and if not, add them manually.

Also, make sure every case of … is a single character, not three. Word should do this automatically for you as you write, but in case you turned that of… make sure it’s replaced as well.

Stage two: HTML import

Now, open your ‘HTML editor’ (Notepad++ in my case). Copy-paste your manuscript into the file.

Next is to turn your manuscript paragraphs into HTML paragraphs, which is another run of search-and-replace that simply puts the appropriate tags at the beginning and the end of each line. This is done with ^(.+)$ in ‘find’ box and <p>$1</p> in the ‘replace’ box.

And then comes the next somewhat controversial step. Guido suggests to replace all special characters such as quotation marks with HTML equivalents (see picture), while Amazon states it unnecessary. Such apporach has its pros and cons.

First, Word can bug the hell out of you with its smart styling. If you can’t get rid of something in Word (either by its persistent auto-replacement or national settings), the HTML equivalents may work.

On the other hand, using the HTML shortcodes for special characters BREAKS Kindle X-ray feature (for terms with an apsotrophe or any other special character), which is probably why Amazon prefers the standard characters. Since I found nothing on that matter during my prep, I learned the hard way and had to do another set of search-and-replace to revert the HTML shortcodes into standard characters (good thing I uploaded two weeks ahead and had the time to fix anything that popped up). Whether you might want to use Kindle X-ray is a topic for different day (and will come with its own guide to set it up), but I wanted to mention this. And, if you wonder, the impact on file size is minimal.

Next, you’ll need to input some mumbo-jumbo that assures the conding is read properly.

Guido’s guide has a version you can copy-paste in his guide, at the bottom of part V for the style part (more on that soon) and part VII for the first two rows. Your manuscript must be between the ‘body’ tags so it takes two copy-pastes (first for the upper half and second for the last line).

Stage three: styles

Now is the time to beautify your book. As you’ve probably mentioned, your HTML file treats any headings as any other paragraphs now, which you have to change. Use a different style for the chapter headers and, using the ‘style’ section of code, customize them to your heart’s desire.

At this point, if you don’t understand CSS much, you may want to open a guide, and the HTML file, in your web browser. I won’t tell you what to do here because only you know how you want your book to look like. Most of the beautification lies in changing font size and positioning (left/right/centered, margins, …) Experiment, hit refresh in web browser to reload the file, see what’s changed, adjust. Repeat, possibly for hours, until you’re satisfied.

You’ll likely need several styles: one for the ‘normal’ text and one for chapter hedaers is just the start. Anything that uses separate formatting apart from being bold or italic should have its own style for easier mass-editing. This applies, for example, to scene breaks, letters, poems, quotes, or excerpts, if your book has them. Drop cap will require a lot of extra coding (and I skipped it altogether). You may want to use another style for the title and copyright page (which tends to be centered, unlike the rest of the book), and another if you plan to use images.

In the preview screenshot above (post-conversion), you see that a message written by one character to another has different different positioning, to differentiate it from the ‘regular’ text – done using different styles.

Bonus: Extra beautification

As I said, HTML allows you to do a lot of cool stuff. Such as a large chapter number instead of the ‘Chapter X’ text. This is something that can be done in Kindle Create but it messes up the Table of Contents.

To do this, you need three styles: one for the number, one for the chapter name, and one for the ToC entry. It’ll also require us to beat the standard ways of building the ToC with manual approach – more on that soon.

A good practice is to name the styles in a way you’ll understand them (no need to copy mine – or someone else’s – code to the letter). cmark is the ToC entry, cnumber is the large number, and chapter is the chapter name.

When it comes to font size, always use relative size instead of fixed – this assures that people who change the font size on their e-readers have all fonts rescaled properly. For example, “10em” means that particular text is 10 times larger than standard paragraph text.

This is another stage for you to go all ape on it. Play with the size ratios and positioning to achieve what you want your chapter headers to look like. In my case, the resulting code in the actual text looks like this:

Make sure the ToC entry is the first one and that it’s the style with forced chapter breaks, so the ToC works properly.

I guess this is a good place to point out one thing: the ‘visibility: hidden;’ part in the cmark style. Yes, that bit of text is invisible. The cmark element is exactly what you want the ToC entry to be. You could possibly avoid it by a complex syntax using static text and the ‘entrails’ of cnumber and chapter elements but just the idea of putting that together in Xpath makes my head hurt. Keep reloading the HTML file in your web browser to see how the look changes based on you poking into the ‘source code’ until you’re satisfied.

Stage four: Conversion

Once you have your styles exactly as you want them (or at least as close to that as your skills let you), it’s time to convert the HTML into an actual e-book. At this point, you’ll need the cover, but you can use any placeholder image for testing. The image will be compressed to something around half a MB (more than enough for an e-reader display).

And here comes another trap: e-book formats. EPUB is an open format readable by pretty much any e-reader. MOBI and AZW3 are Amazon’s formats.

Calibre, which I use for the conversion, can convert to all three formats, and a bunch of others (including supposedly print-ready PDF). By my experience and testing, EPUB exports reflowable text (thus supporting automatic hyphenation and other cool features) while MOBI and AZW3 output produced by Calibre don’t. It’s possible that the upload to KDP may fix MOBI and AZW3 to reflowable text, but I wasn’t willing to risk this when I could see the result right away with EPUB. This is what Kindle Previewer is for – it’ll open your book as if it was an e-reader and you can check how the final book will look like without loading it into a e-reader (which I suggest to do after the final conversion just in case). Unlike Calibre’s e-book reader, Kindle Previewer can work with automatic hyphenation, which is the key difference in previewing the final book file.

Now, there are a few traps for the conversion, which you should be aware of:

  • Preserve cover aspect ratio in EPUB output (doesn’t apply to other formats) should be on, otherwise you risk having your cover distorted.
  • Disable font size rescaling in Look and feel -> Fonts should be ticked. This is especially inportant if you do some fancy stuff like my trick with large chapter numbers – if you keep this on, it’ll squish anything larger than x4 the base font size to x4.
  • Structure detection: first, remove everything from Insert page break before field if you have page breaks done via styles. Then, if you use anything else than h1 or h2 style for chapter headers (such as my fancy headings), you’ll need to adjust Detect chapters at field – this builds the ToC. In my case, I had to change this so it uses the cmark element. The syntax is //h:p[re:test(@class, “cmark”, “i”)] – obviously, the cmark will be replaced by the name of your appropriate style element. And, finally, Start reading at – this allows you to force the e-reader to open the file wherever you want (often used to skip the front matter right to the first chapter). As I said, Xpath is black magic to me, and I failed to make this work.

Then you wait a few seconds for Calibre to create the e-book file for you. If you want a print-ready PDF, run this again, and choose PDF as the output format, then you can edit all the print-related stuff like borders, page size, etc.

Stage five: wrap-up and upload

Once you have your e-book file, run it through Kindle Previewer and/or your e-reader. Check that everything works as you wanted it – a few times if you want to be sure.

When you’re sure that there are no issues, you’re ready to upload your e-book file to whichever retailer(s) you choose. Amazon’s KDP will allow you to have a preview again, online this time, which works exactly the same as the Kindle Previewer software. And after you fill in all the forms in the upload UI and click Publish, your book is now ready to wait for its first readers.

This is it. I hope this gave you some overview of the process and its complexity. If you have questions, feel free to ask, but keep in mind that I’m no HTML expert and, in some cases, I can only point you to my sources (mostly the mentioned guide). What you can ask me about (and what I’ll gladly answer) is my feeling about the process and its complexity – or if you want to further know about some key differences between HTML formatting and Kindle Create.

Also, I’ll welcome your own experience and opinions, so feel free to share. How have you formatter your book? Why did you choose that way, what were the advantages and disadvantages? Sharing this allows other people to decide better so your experience is more than welcome.

One thought on “DYI HTML e-book formatting

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.