NBDev Tutorial

A tutorial of NBDev, it’s use cases and making libraries from it

nbdev is a notebook-driven development platform. Simply write notebooks with lightweight markup and get high-quality documentation, tests, continuous integration, and packaging for free!

nbdev makes debugging and refactoring your code much easier than in traditional programming environments since you always have live objects at your fingertips. nbdev also promotes software engineering best practices because tests and documentation are first class. In a minute or two, you can provide installable libraries to anyone. Even the smallest of code can be just turned into a library and can be useful for someone.

Tests, docs, and code are part of the same context and are co-located. nbdev not only does make the code more approachable but forcing you to write docs forces you to think more about code.

What you can do using NBDev

  • Searchable hyperlinked documentation - Documentation is generated automatically using quarto and gets hosted on Github pages using the automatic workflows designed when you initialize a repository with nbdev. Documents support LaTeX, are searchable, and also support automatic hyperlinking with other parts of your code.

  • Two-way sync between notebooks and editors - Using simple commands, the notebook prepares the python scripts which you can also edit in your editors like VSCode, and then syncs the notebooks with the new changes.

  • Pip and Conda installers - Publish packages to PyPI and Conda directly from your notebook code. Creates python modules and provides tools to simplify package releases. Python best practices for releasing packages are automatically followed, which are sometimes very difficult to do manually in dead coding environments.

  • Testing - Written as part of the notebook cells along with your main code. Using a single command, nbdev runs all the tests in parallel when you prepare your packages. Having tests as part of your main code makes sure they are updated when changes are made to code and are easily accessible unlike in dead coding environments where they are stored separately and need a lot of contexts switching to understand which test case belongs to which function.

  • Continuous Integration - Automatically creates Github Actions workflows that run the tests, rebuild the docs, and host them on Github pages

  • Git friendly - Provides Jupyter/Git hooks that cleans unwanted metadata, thus making it easy to compare git diffs. Also, in case of merge conflicts, instead of giving errors like can’t open notebook renders a clean merge conflict in a human-readable format.

  • Easy Updates - Your ReadMe, PyPi page, and Conda page always stay updated based on the things you write in index.ipynb.

“I Like Notebooks” - Jeremy Howard

FastAI has built a lot of amazing tools just out of notebooks. Most of the libraries are built using nbdev. In one of his videos I Like Notebooks, Jeremy Howard shares why he likes notebooks with proper reasons on how it’s time to start rethinking software engineering principles. He gives some amazing examples of how Jupyter Notebooks coupled with nbdev follows the best practices and are a great way to teach/write technical blogs, share codes, and create reproducible results/issues.

Summary

  • Literate Programming - Literate programming is a methodology that combines a programming language with a documentation language, thereby making programs more robust, more easily maintained, and arguably more fun to write. The main idea is to treat the program as a piece of literature, addressed to human beings rather than to a computer. Notebooks support this by default and act like a journal you can go through from top to bottom, understanding the thought process of the developer along with code and its outputs.

  • Less chances of errors - Since you can run a small part of code, see its output, make plots, and visualize images, makes it easy to debug the issues and make sure the inputs are correct. In dead coding environments, there are no ways to visualize the inputs and outputs which often leads to errors.

  • Easily sharable - With notebooks, you can easily share the results, and issues with others and they can easily reproduce those using something like a Colab environment. And, the most important thing, you can not only share text but also share images, videos, plots, etc. With Software 2.0 we are not just working with text but have a varied kind of data that needs a lot of exploration.

  • Tests live along with code - In dead coding environments, it can be very easy to miss out on tests completely. They live separately from the main code. In Nbdev or in general notebooks, the tests live along with the main code.

  • Better suggestions - Jupyter notebooks are more helpful, as they are more correct in suggesting functions. VSCode doesn’t know the output of the previous line, but jupyter knows that as you ran the code.

Examples

  • FastAI Documentation - The whole documentation is written out of the notebooks. The good thing about this is, that the documentation and the tests always stay updated with new changes in the library.

  • Fastpages - Create technical blogs with LaTeX, images, videos, plots, and code snippets directly from your notebooks.

  • Fastdoc - Create publication-quality books directly from Jupyter Notebooks. The biggest example of this is the book Deep Learning for Coders written completely out of notebooks. This Github repository has the exact notebooks that were used for creating the publication-ready book. The best part of writing a book from a notebook is that the example code you add in your book is actual code, that will run and give the correct output, unlike other books that have many errors or dependency issues. The book is available on Amazon.

Steps to use NBDev

  1. Initialize a GitHub repository.

  2. Clone it to the system.

  3. Install nbdev using

    pip install nbdev

    or

    conda install -c fastai nbdev
  4. Run nbdev_new command.

    • Initializes the repository with nbdev environment files and sample jupyter notebooks.
    • Setup GitHub actions workflow scripts to test notebooks, and build and deploy Quarto docs to GitHub pages.
    • Configure Quarto for publication-grade technical documentation.
    • Streamline publishing Python packages to PyPI and Conda.
  5. Run nbdev_install_hooks

    • This provides three hooks to ease Jupyter-git integration.
    • nbdev_merge: Handles merge conflicts so that notebook loading error doesn’t pop up.
    • nbdev_clean: On saving, cleans up the metadata for clean git commits and pull requests.
    • nbdev_trust: Automatically trusts all the notebooks instead of doing them manually every time.
  6. Run nbdev_preview to preview your docs generated using the notebooks. You can see the live changes in the docs when you save the change in the notebooks.

  7. Before committing your changes to GitHub, run nbdev_prepare in the terminal, which bundles the following commands:

    • nbdev_export: Builds the .py modules from Jupyter notebooks
    • nbdev_test: Tests your notebooks
    • nbdev_clean: Cleans your notebooks to get rid of extraneous output for GitHub
    • nbdev_readme: Updates README.md from your index notebook.

    You can run these commands individually also.

  8. Push to GitHub to see the workflows in action. Essentially two workflows are made as part of CI:

    • Running all the tests in your notebook
    • Building the documentation page and publishing to GitHub pages
  9. Setting up the pip environment for publishing

    • For publishing to PyPi, you’ll have to register your account on the account registration page
    • Install twine as that is required for publishing to pip
    pip install twine
    • Create a file ~/.pypirc in the given format
    [pypi]
    username = your_pypi_username
    password = your_pypi_password
    • Now you’re all set to publish to your pip account.
  10. Setting up the Conda environment for publishing

    • Similar to pip, you’ll have to register your account on the account registration page
    • If you’re using miniconda then anaconda-client won’t be installed. To install that:
    conda install conda-build anaconda-client
    • Login to anaconda using
    anaconda login
    • Apparently, the settings.ini file generated by running nbdev_new doesn’t create the placeholders for Conda variables in the file. You’ll have to add a variable conda_user which can be your username or the organization name. Add conda_user = <username> to the file.
    • Now you’re all set to publish to your Conda account.
  11. Once everything is set, you can push the packages to PyPi or Conda using a simple command nbdev_release_both

    • To publish only to PyPi nbdev_pypi
    • To publish only to Conda nbdev_conda
    • If you’ve already pushed the packages, and want to push a new version of it run the same command nbdev_release_both. It will show an error that the following version already exists and then bump the version number set in your settings.ini file. You can bump the version number manually also by making the change in the file. Run nbdev_release_both again and a new version of the library will be published.

If you want to know about other functions, refer to the documentation or can run the nbdev_help command to see the available functions. Here is the output of the command

!nbdev_help
nbdev_bump_version              Increment version in settings.ini by one
nbdev_changelog                 Create a CHANGELOG.md file from closed and labeled GitHub issues
nbdev_clean                     Clean all notebooks in `fname` to avoid merge conflicts
nbdev_conda                     Create a `meta.yaml` file ready to be built into a package, and optionally build and upload it
nbdev_create_config             Create a config file.
nbdev_docs                      Create Quarto docs and README.md
nbdev_export                    Export notebooks in `path` to Python modules
nbdev_filter                    A notebook filter for Quarto
nbdev_fix                       Create working notebook from conflicted notebook `nbname`
nbdev_help                      Show help for all console scripts
nbdev_install                   Install Quarto and the current library
nbdev_install_hooks             Install Jupyter and git hooks to automatically clean, trust, and fix merge conflicts in notebooks
nbdev_install_quarto            Install latest Quarto on macOS or Linux, prints instructions for Windows
nbdev_merge                     Git merge driver for notebooks
nbdev_migrate                   Convert all markdown and notebook files in `path` from v1 to v2
nbdev_new                       Create an nbdev project.
nbdev_prepare                   Export, test, and clean notebooks, and render README if needed
nbdev_preview                   Preview docs locally
nbdev_proc_nbs                  Process notebooks in `path` for docs rendering
nbdev_pypi                      Create and upload Python package to PyPI
nbdev_readme                    None
nbdev_release_both              Release both conda and PyPI packages
nbdev_release_gh                Calls `nbdev_changelog`, lets you edit the result, then pushes to git and calls `nbdev_release_git`
nbdev_release_git               Tag and create a release in GitHub for the current version
nbdev_sidebar                   Create sidebar.yml
nbdev_test                      Test in parallel notebooks matching `path`, passing along `flags`
nbdev_trust                     Trust notebooks matching `fname`
nbdev_update                    Propagate change in modules matching `fname` to notebooks that created them

Important Files

  1. settings.ini - You can setup the project config directly from here like description, repository name, author name, etc.
  2. index.ipynb - This is the most important notebook. The documentation generated from this notebook becomes part of your ReadMe, PiP, and Conda Description page. This will be the homepage for your documentation.
  3. 00_core.ipynb - This is an example notebook in which you can write the functions for your library. As pre-written in the notebook, this gets exported to core.py python module. You can add more such notebooks, not necessarily with naming conventions like 00_ but Jeremy Howard suggests using this as this then acts like a journal showing the developer’s thought process.

Directives

  1. #|default_exp <name>: Name of the module where cells with the #|export directive will be exported by default.
  2. #| export: Exports the items in the cell into the generated module and documentation.
  3. #| hide: Hides the code cell from the generated module as well as documentation. Used this in the import statements that need not be part of the generated module.
  4. If you don’t pass any directive to the cell, that will be part of the documentation but not of the generated module.
  5. #|echo: <true|false>: Toggles the visibility of code cell in the documentation. Used this to hide the code cell that embedded the Youtube page in the documentation.
  6. #| output: <true|false|asis>: Toggles the visibility of the output from the code cell in the documentation. Used this to hide the print statements output from being part of the documentation
  7. #| filter_stream <space separated list of keywords>: Hides the keywords from the output of the code cell. Used this to hide the irritating warnings printed from the code cell when using the sklearn library.

Many more useful directives are available, refer to the documentation for more.

Extra Features

  1. NBDev supports most of the Quarto features. One of them I’ve used in my documentation is the mermaid flowchart which is very simple to make from a notebook. Refer to Quarto diagrams documentation to use other types of charts as part of your documentation.

  2. NBdev supports equations (using Quarto). You can include math in your notebook’s documentation using $$. Example: \[\sum_{i=1}^{k+1}i\]

  3. Useful Jupyter extensions:

    • Collapsible headings: This lets you fold and unfold each section in your notebook, based on its markdown headings.
    • TOC2: This adds a table of contents to your notebooks, which you can navigate either with the Navigate menu item it adds to your notebooks or the TOC sidebar it adds. These can be modified and/or hidden using its settings.
  4. If you already have a project, then you can simply migrate it to nbdev using the library built by Novetta.

    I haven’t explored this library yet, just adding this for information

Issues I faced while running for the first time

  • In the NBDev documentation, it was not written to create an account on Pip and Conda registration page.
  • settings.ini by default doesn’t create variables for the conda environment which I had to figure out on my own. Manually add conda_user variable to the config. This name can be either your username or the name of the organization. Like FastAI uses fastai.
  • I was using miniconda and didn’t know that I have to install anaconda client for doing the login. Found the solution from the resolved issue in nbdev repository
  • When you run nbdev_release_both it updates the config and bumps the version number. There is some problem with the updated config, that leads to issues in deploying the docs to GitHub pages. To avoid this, don’t upload the updated config, instead, upload the old config with the bumped version number. This is a hacky solution, more investigation needs to be done to identify the exact issue.

References