Published on: December 7, 2016
17 min read
How we built the new GitLab Docs portal from the ground up

We recently rebuilt docs.gitlab.com from scratch. Where previously the site was generated with a simple Ruby script, we now use a proper static site generator.
Check out the improvements we made, the structure we now use to deploy from specific directories in multiple repositories to a single website, build with GitLab CI and deployed with GitLab Pages. Now our documentation has a nicer look and feel, is more pleasant to read through, and simpler and quicker to maintain.
The old documentation website was pretty much just an HTML file, a
stylesheet, and a Ruby script called generate.rb. While it
worked, it was hard to update and not very flexible. It mostly laid dormant,
only occasionally being touched by developers. The docs team really wanted
to update the site to use a static site
generator and take
better advantage of GitLab Pages.
We chose Nanoc because it’s fast, it comes with a number of built-in helpers and filters (as well as the ability to create custom ones), and it’s built with Ruby. Overall, we think this was definitely the right choice. The author was very responsive and addressed anything we brought up. Kudos to him on the great project!
Other improvements include syntax highlighting with Rouge (no syntax highlighting was used at all on the old site), breadcrumbs for navigating between pages, and an improved overall design – especially on mobile.
Our documentation site has some unique requirements that I haven’t seen mentioned or solved in any other companies’ blog posts. We have a few products with documentation we want to include in the site: Community Edition, Enterprise Edition, Omnibus GitLab, and GitLab Runner. In the future we’ll likely add more.
Each product has it own repository with its own documentation directory. This allows developers to add documentation in the same merge request they add a new feature or change some behavior, which prevents documentation from becoming outdated.
The site also needed to be flexible enough that we could add versioning to it in the future. Eventually, our goal is to replace the Help section in CE/EE with this Docs site, so we need to maintain older versions of the documentation on the Docs site for users on older versions of GitLab.
Given the requirements and separate repositories, we decided we’d just need to clone the repositories as part of the build process.
Inside Nanoc's config file (nanoc.yml), we have defined a
hash of each of our products containing all the data we need. Here's an
excerpt:
products:
  ce:
    full_name: 'GitLab Community Edition'
    short_name: 'Community Edition'
    abbreviation: 'CE'
    slug: 'ce'
    index_file: 'README.*'
    description: 'Browse user and administration documentation and guides for GitLab Community Edition.'
    repo: 'https://gitlab.com/gitlab-org/gitlab-ce.git'
    dirs:
      temp_dir: 'tmp/ce/'
      dest_dir: 'content/ce'
      doc_dir:  'doc'
...
  runner:
    full_name: 'GitLab Runner'
    short_name: 'Runner'
    abbreviation: 'RU'
    slug: 'runner'
    index_file: 'index.*'
    description: 'Browse installation, configuration, maintenance, and troubleshooting documentation for GitLab Runner.'
    repo: 'https://gitlab.com/gitlab-org/gitlab-runner.git'
    dirs:
      temp_dir: 'tmp/runner/'
      dest_dir: 'content/runner'
      doc_dir:  'docs'
We then have the Rakefile where the repos are cloned and the directories that
Nanoc needs are created:
desc 'Pulls down the CE, EE, Omnibus and Runner git repos and merges the
content of their doc directories into the nanoc site'
task :pull_repos do
  require 'yaml'
  # By default won't delete any directories, requires all relevant directories
  # be empty. Run `RAKE_FORCE_DELETE=true rake pull_repos` to have directories
  # deleted.
  force_delete = ENV['RAKE_FORCE_DELETE']
  # Parse the config file and create a hash.
  config = YAML.load_file('./nanoc.yaml')
  # Pull products data from the config.
  ce = config["products"]["ce"]
  ee = config["products"]["ee"]
  omnibus = config["products"]["omnibus"]
  runner = config["products"]["runner"]
  products = [ce, ee, omnibus, runner]
  dirs = []
  products.each do |product|
    dirs.push(product['dirs']['temp_dir'])
    dirs.push(product['dirs']['dest_dir'])
  end
  if force_delete
    puts "WARNING: Are you sure you want to remove #{dirs.join(', ')}? [y/n]"
    exit unless STDIN.gets.index(/y/i) == 0
    dirs.each do |dir|
      puts "\n=> Deleting #{dir} if it exists\n"
      FileUtils.rm_r("#{dir}") if File.exist?("#{dir}")
    end
  else
    puts "NOTE: The following directories must be empty otherwise this task " +
      "will fail:\n#{dirs.join(', ')}"
    puts "If you want to force-delete the `tmp/` and `content/` folders so \n" +
      "the task will run without manual intervention, run \n" +
      "`RAKE_FORCE_DELETE=true rake pull_repos`."
  end
  dirs.each do |dir|
    unless "#{dir}".start_with?("tmp")
      puts "\n=> Making an empty #{dir}"
      FileUtils.mkdir("#{dir}") unless File.exist?("#{dir}")
    end
  end
  products.each do |product|
    temp_dir = File.join(product['dirs']['temp_dir'])
    puts "\n=> Cloning #{product['repo']} into #{temp_dir}\n"
    `git clone #{product['repo']} #{temp_dir} --depth 1 --branch master`
    temp_doc_dir = File.join(product['dirs']['temp_dir'], product['dirs']['doc_dir'], '.')
    destination_dir = File.join(product['dirs']['dest_dir'])
    puts "\n=> Copying #{temp_doc_dir} into #{destination_dir}\n"
    FileUtils.cp_r(temp_doc_dir, destination_dir)
  end
end
The pull_repos task inside the Rakefile is pretty self-explanatory if you
know
some Ruby, but here's what it does:
nanoc.yml is loaded since it contains the information we need for the
various products:
config = YAML.load_file('./nanoc.yaml')
The products data are pulled from the config:
ce = config["products"]["ce"]
ee = config["products"]["ee"]
omnibus = config["products"]["omnibus"]
runner = config["products"]["runner"]
The needed directories to be created (or deleted) are populated in an array:
products = [ce, ee, omnibus, runner]
dirs = []
products.each do |product|
  dirs.push(product['dirs']['temp_dir'])
  dirs.push(product['dirs']['dest_dir'])
end
The empty directories are created:
dirs.each do |dir|
  unless "#{dir}".start_with?("tmp")
    puts "\n=> Making an empty #{dir}"
    FileUtils.mkdir("#{dir}") unless File.exist?("#{dir}")
  end
end
We finally copy the contents of the documentation directory (defined by
doc_dir) for each product from tmp/ to content/:
products.each do |product|
  temp_dir = File.join(product['dirs']['temp_dir'])
  puts "\n=> Cloning #{product['repo']} into #{temp_dir}\n"
  `git clone #{product['repo']} #{temp_dir} --depth 1 --branch master`
  temp_doc_dir = File.join(product['dirs']['temp_dir'], product['dirs']['doc_dir'], '.')
  destination_dir = File.join(product['dirs']['dest_dir'])
  puts "\n=> Copying #{temp_doc_dir} into #{destination_dir}\n"
  FileUtils.cp_r(temp_doc_dir, destination_dir)
end
content/ is where Nanoc looks for the actual site’s Markdown files. To prevent the tmp/ and content/ subdirectories from being pushed after testing the site locally, they’re excluded by .gitignore.
In the future we may speed this up further by caching the tmp folder in
CI. The task would need to be updated to check if the local repository is
up-to-date with the remote, only cloning if they differ.
Now that all the needed files are in order, we run nanoc to build the
static sire. Nanoc runs each Markdown file through a series of
filters defined by rules in the Rules file. We
currently use Redcarpet as the Markdown parser along with Rouge for
syntax highlighting, as well as some custom filters. We plan on moving to
Kramdown as our Markdown parser in the
future as it provides
some nice stuff like user-defined Table of Contents, etc.
We also define some filters inside the lib/filters/
directory,
including one that replaces any .md extension with .html.
The Table of Contents (ToC) is generated for each page except when it's
named index.md
or README.md as we usually use these as landing pages to index other
documentation files and we don't want them to have a ToC. All this and some
other options that Redcarpet provides are defined in the Rules
file.
For more on the specifics of building a site with Nanoc, see the Nanoc tutorial.
The new docs portal is hosted on GitLab.com at https://gitlab.com/gitlab-org/gitlab-docs.
In that project we create issues, discuss things, open merge requests in feature
branches, iterate on feedback and finally merge things in the master
branch.
Again, the documentation source files are not stored in this repository, if
you want to contribute, you'd have to open a merge request to the respective
project.
There are 3 key things we use to test, build, deploy and host the Nanoc site
all built into GitLab: GitLab CI, Review Apps
and GitLab Pages.
Let's break it down to pieces.
GitLab CI is responsible of all the stages that we go through to publish
new documentation: test, build and deploy.
Nanoc has a built-in system of Checks,
including HTML/CSS and internal/external link validation. With GitLab CI we
test with the internal link checker (set to allow failure)
and also verify that the site compiles without errors. We also run a SCSS
Linter to make sure our SCSS looks
uniform.
Our full
.gitlab-ci.yml
file looks like this. We'll break it down to make it clear what it is doing:
image: ruby:2.3
## Cache the vendor/ruby directory
cache:
  key: "ruby-231"
  paths:
  - vendor/ruby
## Define the stages
stages:
  - test
  - deploy
## Before each job's script is run, run the commands below
before_script:
  - ruby -v
  - bundle install --jobs 4 --path vendor
## Make sure the site builds successfully
verify_compile:
  stage: test
  script:
    - rake pull_repos
    - nanoc
  artifacts:
    paths:
      - public
    expire_in: 1w
  except:
    - master
  tags:
    - docker
## Check for dead internal links using Nanoc's built-in tool
internal_links:
  stage: test
  script:
    - rake pull_repos
    - nanoc
    - nanoc check internal_links
  allow_failure: true
  tags:
    - docker
## Make sure our SCSS stylesheets are correctly defined
scss_lint:
  stage: test
  script:
    - npx sass-lint '**/*.scss' -v
  tags:
    - docker
## A job that deploys a review app to a dedicated server running Nginx.
review:
  stage: deploy
  variables:
    GIT_STRATEGY: none
  before_script: []
  cache: {}
  script:
    - rsync -av --delete public /srv/nginx/pages/$CI_BUILD_REF_NAME
  environment:
    name: review/$CI_BUILD_REF_NAME
    url: http://$CI_BUILD_REF_NAME.$APPS_DOMAIN
    on_stop: review_stop
  only:
    - branches@gitlab-org/gitlab-docs
  except:
    - master
  tags:
    - nginx
    - review-apps
## Stop the review app
review_stop:
  stage: deploy
  variables:
    GIT_STRATEGY: none
  before_script: []
  artifacts: {}
  cache: {}
  dependencies: []
  script:
    - rm -rf public /srv/nginx/pages/$CI_BUILD_REF_NAME
  when: manual
  environment:
    name: review/$CI_BUILD_REF_NAME
    action: stop
  only:
    - branches@gitlab-org/gitlab-docs
  except:
    - master
  tags:
    - nginx
    - review-apps
## Deploy the static site to GitLab Pages
pages:
  stage: deploy
  environment:
    name: production
    url: https://docs.gitlab.com
  script:
    - rake pull_repos
    - nanoc
    # Symlink all README.html to index.html
    - for i in `find public -name README.html`; do ln -sf README.html $(dirname $i)/index.html; done
  artifacts:
    paths:
    - public
    expire_in: 1h
  only:
    - master@gitlab-org/gitlab-docs
  tags:
    - docker
To better visualize how the jobs are run, take a look at how the pipeline
graph looks like for one of the pipelines.
 {:
.shadow}
{:
.shadow}
Let's see what all these settings mean.
For more information, you can read the documentation on
.gitlab-ci.yml.
Define the Docker image to be used:
image: ruby:2.3
Cache the vendor/ruby directory so that we don't have to install the
gems for each job/pipeline:
cache:
  key: "ruby-231"
  paths:
  - vendor/ruby
Define the stages the jobs will run:
stages:
  - test
  - deploy
Before each job's script is run, run the commands that are defined in the
before_script. Display the Ruby version and install
the needed gems:
before_script:
  - ruby -v
  - bundle install --jobs 4 --path vendor
In the verify_compile job we make sure the site builds successfully.
It first pulls the repos locally, then runs nanoc to compile the site.
The public/ directory where the static site is built, is uploaded as
an artifact so that it can pass between stages. We define an expire date of
one week. The job runs on all refs except master. The docker tag ensures
that
this job is picked by the shared Runners on GitLab.com:
verify_compile:
  stage: test
  script:
    - rake pull_repos
    - nanoc
  artifacts:
    paths:
      - public
    expire_in: 1w
  except:
    - master
  tags:
    - docker
In the internal_links job we check for dead internal links using Nanoc's
built-in functionality. We first need to pull the repos and compile the static
site. We allow it to fail since the source of the dead links are in a
different repository, not much related with the current one.
The docker tag ensures that this job is picked by the shared Runners
on GitLab.com:
internal_links:
  stage: test
  script:
    - rake pull_repos
    - nanoc
    - nanoc check internal_links
  allow_failure: true
  tags:
    - docker
The scss_lint job makes sure our SCSS stylesheets are correctly defined by
running a linter on them. The docker tag ensures that this job is picked
by
the shared Runners on GitLab.com:
scss_lint:
  stage: test
  script:
    - npx sass-lint '**/*.scss' -v
  tags:
    - docker
Next, we define the Review Apps.
When opening a merge request for the docs site we use a new feature called Review Apps to test changes. This lets us test new features, style changes, new sections, etc., by deploying the updated static site to a test domain. On every merge request that all jobs finished successfully, we can see a link with the URL to the temporary deployed docs site.
 {:
.shadow}
{:
.shadow}
We define two additional jobs for that purpose in .gitlab-ci.yml:
review:
  stage: deploy
  variables:
    GIT_STRATEGY: none
  before_script: []
  cache: {}
  script:
    - rsync -av --delete public /srv/nginx/pages/$CI_BUILD_REF_NAME
  environment:
    name: review/$CI_BUILD_REF_NAME
    url: http://$CI_BUILD_REF_NAME.$APPS_DOMAIN
    on_stop: review_stop
  only:
    - branches@gitlab-org/gitlab-docs
  except:
    - master
  tags:
    - nginx
    - review-apps
review_stop:
  stage: deploy
  variables:
    GIT_STRATEGY: none
  before_script: []
  artifacts: {}
  cache: {}
  dependencies: []
  script:
    - rm -rf public /srv/nginx/pages/$CI_BUILD_REF_NAME
  when: manual
  environment:
    name: review/$CI_BUILD_REF_NAME
    action: stop
  only:
    - branches@gitlab-org/gitlab-docs
  except:
    - master
  tags:
    - nginx
    - review-apps
They both run on all branches except master since master is deployed
straight
to production. Once someone with write access to the repository pushes a branch
and creates a merge request, if the jobs in the test stage finish
successfully,
the review job deploys the code of that particular branch to a server. The
server is set up to use Nginx with Review Apps, and it uses
the artifacts from the previously verify_compile job which contain the
public/ directory with the HTML files Nanoc compiled.
Notice that both jobs rely on dynamic environments and with
the review/ prefix we can group them under the Environments
page.
The review_stop job depends on the review one and is called whenever we
want to clear up the review app. By default it is called every time the related
branch is deleted, but you can also manually call it with the buttons that can
be found in GitLab.
The trick of this particular set up is that we use the shared Runners provided
in GitLab.com to test and build the docs site (using Docker containers) whereas
we use a specific Runner that is set up in the server that hosts the Review Apps
and is configured with the shell executor. GitLab CI knows what Runner to use
each time from the tags we provide each job with.
The review job has also some other things specified:
variables:
  GIT_STRATEGY: none
before_script: []
cache: {}
In this case, GIT_STRATEGY is set up to none since we
don't need to
checkout the repository for this job. We only use rsync to copy over the
artifacts that were passed from the previous job to the server where Review
Apps are deployed. We also turn off the before_script since we don't need
it
to run, same for cache. They both are defined globally, so you need to
pass
an empty array and hash respectively to disable them in a job level.
On the other hand, setting the GIT_STRATEGY to none is necessary on the
review_stop job so that the GitLab Runner won't try to checkout the code
after
the branch is deleted. We also define one additional thing in it:
dependencies: []
Since this is the last job that is performed in the lifecycle of a merge request
(after it's merged and the branch deleted), we opt to not download any artifacts
from the previous stage with passing an empty array in
dependencies.
See our blog post on Review Apps for
more information about how they work and their purpose. Be sure to also check
the Review Apps documentation as well as how dynamic environments work
since they are the basis of the Review Apps.
The final step after the site gets successfully built is to deploy to
production which is under the URL everybody knows: https://docs.gitlab.com.
For that purpose, we use GitLab Pages.
GitLab Pages hosts static websites and can be used with any Static Site Generator, including Jekyll, Hugo, Middleman, Pelican, and of course Nanoc.
GitLab Pages allows us to create the static site dynamically since it just
deploys the public directory after the GitLab CI task is done. The job
responsible for this is named pages.
A production environment is set with a url to the of the docs portal.
The script pulls the repos, runs nanoc to compile the static site.
The public/ directory where the static site is built, is uploaded as
an artifact so that it can be deployed to GitLab Pages. We define an expire
date of one hour and the job runs only on the master branch.
The docker tag ensures that this job is picked by the shared Runners
on GitLab.com.
pages:
  stage: deploy
  environment:
    name: production
    url: https://docs.gitlab.com
  script:
    - rake pull_repos
    - nanoc
    # Symlink all README.html to index.html
    - for i in `find public -name README.html`; do ln -sf README.html $(dirname $i)/index.html; done
  artifacts:
    paths:
    - public
    expire_in: 1h
  only:
    - master@gitlab-org/gitlab-docs
  tags:
    - docker
GitLab Pages deploys our documentation site whenever a commit is made to the
master branch of the gitlab-docs repository and is run only on the master
branch of the gitlab-docs project.
Since the documentation content itself is not hosted under the gitlab-docs repository, we rely to a CI job under all the products we build the docs site from. We specifically make use of triggers where a build for the docs site is triggered whenever CI runs successfully on the master branches of CE, EE, Omnibus GitLab, or Runner. If you go to the pipelines page of the gitlab-docs project, you can notice the triggered word next to the pipelines that are re-run because a trigger was initiated.
 {:
.shadow}
{:
.shadow}
How we specifically use triggers for gitlab-docs is briefly described in the
We also use a hack to symlink all README.html files into index.html so
that
they can be viewed without the extension. Notice how the following links point
to the same document:
The line responsible for this is:
for i in `find public -name README.html`; do ln -sf README.html $(dirname
$i)/index.html; done
The artifacts are made to expire in an hour since they are deployed to the
GitLab Pages server, we don't need them lingering in GitLab forever.
It’s worth noting that GitLab Pages is a GitLab Enterprise Edition-only feature, but it’s also available for free on GitLab.com.
Hopefully this shows some GitLab's power and how having everything integrated into one cohesive product simplifies one's workflow. If you have a complex documentation site you’d like to put together from specific directories in multiple Git repositories, the process described above is the best we've been able to come up with. If you have any ideas to make this system better, let us know!
The documentation website is open source, available under the MIT License. You’re welcome to take a look at it, submit a merge request, or even fork it to use it with your own project.
Thanks for reading, if you have any questions we’d be happy to answer them in the comments!