For Caplin Dev Week in July this year I ran a proof-of-concept migration of our developer website to a static-site generator.
Over the past year I have developed an interest in the ‘Docs Like Code’ approach to technical writing, and Caplin Dev Week was a great opportunity to try building our developer website from source files rather than serving it dynamically from a CMS.
Docs like code
I first heard of the Docs Like Code approach at conferences last year, but it was reading Anne Gentle’s Docs Like Code and Andrew Etter’s Modern Technical Writing that really developed my interest.
The central idea of the Docs Like Code approach is that software firms can improve their documentation process by using the same tools to work with documentation that their engineers use to work with code. Instead of preparing documentation in closed systems, documentation is prepared from source files formatted in a lightweight markup language. In this format, the documentation is compatible with the tools that developers use every day to create, version, and review code. This allows the workflows of documentation and development teams to align, and creates opportunities for automating the building, testing, and publishing of documentation.
Aside from future automation, I could see two immediate benefits for Caplin: improved collaboration and a more semantic documentation format.
Improving collaboration was one of the motivations for Caplin’s move to an online CMS in 2012, but it hasn’t provided the increase in collaboration we wanted. Although every developer now has access to authoring tools, the CMS is not a tool that developers use every day, and its workflows are easily forgotten without regular use. Storing documentation in Git and reviewing changes in a repository manager could be the change we need.
A change in documentation format would also be beneficial. With the move to an online CMS, we began writing our documentation in HTML. While this gave us an advantage in online presentation, its flexibility and lack of semantic structure makes migrating and converting content difficult. Moving to a more structured source format would improve the portability of our content.
Content migration
I began the task of migrating our documentation to new formats as a side-project in late 2016. I wrote my own routines to extract our content and document hierarchy from our CMS’s database, and then used a DOM parser and XPath queries to clean our HTML prior to conversion.
I decided to convert our content to Markdown first, despite having reservations over its limited feature set and portability. The lightweight markup I really wanted to try was Asciidoc (specifically Asciidoctor). Its syntax is more complete than Markdown’s, but I still felt that strong support for Markdown would be essential for the success of this project. Markdown is more widely known among developers than Asciidoc, and I didn’t want to replace one barrier to collaboration with another.
By the start of Dev Week, I discovered that most of Caplin’s developer content could be converted to Markdown without recourse to non-portable HTML. I was more optimistic that Markdown might actually work for us, but some annoying issues remained.
The limitation with the biggest impact is Markdown’s table syntax. It’s cumbersome to work with and doesn’t support table captions and block content in table cells. Arguably we could use fewer tables on our website, but Markdown’s poor support for tables remains a frustrating limitation.
Not far behind, in terms of impact, is Markdown’s lack of support for image dimensions. Many of our images would have to be manually resized in a photo editor, including all screenshots taken on HiDPI screens.
Lastly, there is no support for embedding video. There is no choice but to sacrifice portability and resort to the HTML <video> tag.
Dev Week
With our content converted to Markdown, and our menus exported to data files, it was time to choose a site generator. I wasn’t short of options. Netlify’s StaticGen site was helpful in choosing between generators, and John MacFarlane’s Babelmark2 site was invaluable in analysing the behaviour of different Markdown parsers. I chose the Jekyll generator and the kramdown parser as offering the best chance of success for a quick proof-of-concept project.
I got off to a good start: the kramdown parser immediately solved two outstanding issues I had with Markdown. Kramdown supports attribute lists for all elements, which allowed me to specify dimensions for images and to style paragraphs as table captions. Universal attribute lists are only supported by a few Markdown parsers at the moment, but the document converter Pandoc already supports attribute lists for images and links, and it looks likely that Pandoc will extend this support to all elements in future.
After these early wins, I moved on to the key challenges in creating any static site: menus, breadcrumbs, and search.
Rendering hierarchical menus was easier in Jekyll than I expected because Jekyll’s templating system supports recursive, parameterised includes. I stored the menu as a nested data structure in a YAML data file, then created an include file that accepted and rendered an array of menu items. Because the menu and its submenus had the same repeating data structure, the include file could include itself recursively to render submenus to any depth.
Generating breadcrumbs from the menu’s YAML file was not so straightforward. The breadcrumbs for a page had to be derived by searching the menu for the route to the page’s menu entry; more a job for a Jekyll plugin than for template tags. Having no knowledge of Ruby and limited time in which to learn it, I decided instead to use JavaScript in the browser to generate the breadcrumbs from the rendered menu’s DOM. After Dev Week, I found that it was a relatively simple task to create a Jekyll ‘Generator’ plugin that traverses a menu’s data structure and appends breadcrumb data to the site’s pages.
Implementing a search engine in static sites is commonly either done with external solutions, such as Swiftype and Algolia, or pure JavaScript solutions such as Lunr.js. However, for Dev Week I didn’t want to consider new solutions; I only wanted to determine if we could continue to use our current external provider. To implement site search, I added a PHP script that queried our search provider and returned results as a JSON object, which were then rendered dynamically by the browser.
By the end of Dev Week, I could declare the project a success. I had created a static version of our developer website, with working navigation and search, from source files that used a minimum of HTML.
Future work
Now I know that I can create a static site, I can look in more detail at the logistics of working with a static site.
URLs: My proof-of-concept static site mirrored our current URL scheme, but is that the best scheme for a static site? Should the directory layout match the hierarchy in the menu? If the URL scheme changes, what webserver redirect directives would be required to honour legacy URLs?
Workflows: How would common workflows look in a Docs Like Code model? How would we publish pages for internal audiences only? How would we version and archive product documentation?
Git: What are the best practices for working with written content in Git? What Git branching model should we use? How can we keep merge conflicts to a minimum?
Asciidoc: A lot of our HTML tables need to be refactored away to increase compatibility with Markdown. Would Asciidoc be a better choice for complex layouts? Should we mix Markdown and Asciidoc files in our site, choosing the best format for each page, or would that mixture make future migration of our content more difficult?