NBDev Tutorial
nbdev is a notebook-driven development platform. Simply write notebooks with lightweight markup and get high-quality documentation, tests, continuous integration, and packaging for free!
nbdev makes debugging and refactoring your code much easier than in traditional programming environments since you always have live objects at your fingertips. nbdev also promotes software engineering best practices because tests and documentation are first class. In a minute or two, you can provide installable libraries to anyone. Even the smallest of code can be just turned into a library and can be useful for someone.
Tests, docs, and code are part of the same context and are co-located. nbdev not only does make the code more approachable but forcing you to write docs forces you to think more about code.
What you can do using NBDev
Searchable hyperlinked documentation - Documentation is generated automatically using
quartoand gets hosted on Github pages using the automatic workflows designed when you initialize a repository withnbdev. Documents support LaTeX, are searchable, and also support automatic hyperlinking with other parts of your code.Two-way sync between notebooks and editors - Using simple commands, the notebook prepares the python scripts which you can also edit in your editors like VSCode, and then syncs the notebooks with the new changes.
Pip and Conda installers - Publish packages to PyPI and Conda directly from your notebook code. Creates python modules and provides tools to simplify package releases. Python best practices for releasing packages are automatically followed, which are sometimes very difficult to do manually in dead coding environments.
Testing - Written as part of the notebook cells along with your main code. Using a single command, nbdev runs all the tests in parallel when you prepare your packages. Having tests as part of your main code makes sure they are updated when changes are made to code and are easily accessible unlike in dead coding environments where they are stored separately and need a lot of contexts switching to understand which test case belongs to which function.
Continuous Integration - Automatically creates Github Actions workflows that run the tests, rebuild the docs, and host them on Github pages
Git friendly - Provides Jupyter/Git hooks that cleans unwanted metadata, thus making it easy to compare git diffs. Also, in case of merge conflicts, instead of giving errors like can’t open notebook renders a clean merge conflict in a human-readable format.
Easy Updates - Your ReadMe, PyPi page, and Conda page always stay updated based on the things you write in
index.ipynb.
“I Like Notebooks” - Jeremy Howard
FastAI has built a lot of amazing tools just out of notebooks. Most of the libraries are built using nbdev. In one of his videos I Like Notebooks, Jeremy Howard shares why he likes notebooks with proper reasons on how it’s time to start rethinking software engineering principles. He gives some amazing examples of how Jupyter Notebooks coupled with nbdev follows the best practices and are a great way to teach/write technical blogs, share codes, and create reproducible results/issues.
Summary
Literate Programming - Literate programming is a methodology that combines a programming language with a documentation language, thereby making programs more robust, more easily maintained, and arguably more fun to write. The main idea is to treat the program as a piece of literature, addressed to human beings rather than to a computer. Notebooks support this by default and act like a journal you can go through from top to bottom, understanding the thought process of the developer along with code and its outputs.
Less chances of errors - Since you can run a small part of code, see its output, make plots, and visualize images, makes it easy to debug the issues and make sure the inputs are correct. In dead coding environments, there are no ways to visualize the inputs and outputs which often leads to errors.
Easily sharable - With notebooks, you can easily share the results, and issues with others and they can easily reproduce those using something like a Colab environment. And, the most important thing, you can not only share text but also share images, videos, plots, etc. With Software 2.0 we are not just working with text but have a varied kind of data that needs a lot of exploration.
Tests live along with code - In dead coding environments, it can be very easy to miss out on tests completely. They live separately from the main code. In Nbdev or in general notebooks, the tests live along with the main code.
Better suggestions - Jupyter notebooks are more helpful, as they are more correct in suggesting functions. VSCode doesn’t know the output of the previous line, but jupyter knows that as you ran the code.
Examples
FastAI Documentation - The whole documentation is written out of the notebooks. The good thing about this is, that the documentation and the tests always stay updated with new changes in the library.
Fastpages - Create technical blogs with LaTeX, images, videos, plots, and code snippets directly from your notebooks.
Fastdoc - Create publication-quality books directly from Jupyter Notebooks. The biggest example of this is the book Deep Learning for Coders written completely out of notebooks. This Github repository has the exact notebooks that were used for creating the publication-ready book. The best part of writing a book from a notebook is that the example code you add in your book is actual code, that will run and give the correct output, unlike other books that have many errors or dependency issues. The book is available on Amazon.
Steps to use NBDev
Initialize a GitHub repository.
Clone it to the system.
Install
nbdevusingpip install nbdevor
conda install -c fastai nbdevRun
nbdev_newcommand.- Initializes the repository with nbdev environment files and sample jupyter notebooks.
- Setup GitHub actions workflow scripts to test notebooks, and build and deploy Quarto docs to GitHub pages.
- Configure Quarto for publication-grade technical documentation.
- Streamline publishing Python packages to PyPI and Conda.
-
- This provides three hooks to ease Jupyter-git integration.
nbdev_merge: Handles merge conflicts so that notebook loading error doesn’t pop up.nbdev_clean: On saving, cleans up the metadata for clean git commits and pull requests.nbdev_trust: Automatically trusts all the notebooks instead of doing them manually every time.
Run
nbdev_previewto preview your docs generated using the notebooks. You can see the live changes in the docs when you save the change in the notebooks.Before committing your changes to GitHub, run
nbdev_preparein the terminal, which bundles the following commands:nbdev_export: Builds the.pymodules from Jupyter notebooksnbdev_test: Tests your notebooksnbdev_clean: Cleans your notebooks to get rid of extraneous output for GitHubnbdev_readme: UpdatesREADME.mdfrom your index notebook.
You can run these commands individually also.
Push to GitHub to see the workflows in action. Essentially two workflows are made as part of CI:
- Running all the tests in your notebook
- Building the documentation page and publishing to GitHub pages
Setting up the pip environment for publishing
- For publishing to PyPi, you’ll have to register your account on the account registration page
- Install
twineas that is required for publishing to pip
pip install twine- Create a file
~/.pypircin the given format
[pypi] username = your_pypi_username password = your_pypi_password- Now you’re all set to publish to your pip account.
Setting up the Conda environment for publishing
- Similar to pip, you’ll have to register your account on the account registration page
- If you’re using
minicondathenanaconda-clientwon’t be installed. To install that:
conda install conda-build anaconda-client- Login to anaconda using
anaconda login- Apparently, the
settings.inifile generated by runningnbdev_newdoesn’t create the placeholders for Conda variables in the file. You’ll have to add a variableconda_userwhich can be your username or the organization name. Addconda_user = <username>to the file. - Now you’re all set to publish to your Conda account.
Once everything is set, you can push the packages to PyPi or Conda using a simple command
nbdev_release_both- To publish only to PyPi
nbdev_pypi - To publish only to Conda
nbdev_conda - If you’ve already pushed the packages, and want to push a new version of it run the same command
nbdev_release_both. It will show an error that the following version already exists and then bump the version number set in yoursettings.inifile. You can bump the version number manually also by making the change in the file. Runnbdev_release_bothagain and a new version of the library will be published.
- To publish only to PyPi
If you want to know about other functions, refer to the documentation or can run the nbdev_help command to see the available functions. Here is the output of the command
!nbdev_helpnbdev_bump_version Increment version in settings.ini by one
nbdev_changelog Create a CHANGELOG.md file from closed and labeled GitHub issues
nbdev_clean Clean all notebooks in `fname` to avoid merge conflicts
nbdev_conda Create a `meta.yaml` file ready to be built into a package, and optionally build and upload it
nbdev_create_config Create a config file.
nbdev_docs Create Quarto docs and README.md
nbdev_export Export notebooks in `path` to Python modules
nbdev_filter A notebook filter for Quarto
nbdev_fix Create working notebook from conflicted notebook `nbname`
nbdev_help Show help for all console scripts
nbdev_install Install Quarto and the current library
nbdev_install_hooks Install Jupyter and git hooks to automatically clean, trust, and fix merge conflicts in notebooks
nbdev_install_quarto Install latest Quarto on macOS or Linux, prints instructions for Windows
nbdev_merge Git merge driver for notebooks
nbdev_migrate Convert all markdown and notebook files in `path` from v1 to v2
nbdev_new Create an nbdev project.
nbdev_prepare Export, test, and clean notebooks, and render README if needed
nbdev_preview Preview docs locally
nbdev_proc_nbs Process notebooks in `path` for docs rendering
nbdev_pypi Create and upload Python package to PyPI
nbdev_readme None
nbdev_release_both Release both conda and PyPI packages
nbdev_release_gh Calls `nbdev_changelog`, lets you edit the result, then pushes to git and calls `nbdev_release_git`
nbdev_release_git Tag and create a release in GitHub for the current version
nbdev_sidebar Create sidebar.yml
nbdev_test Test in parallel notebooks matching `path`, passing along `flags`
nbdev_trust Trust notebooks matching `fname`
nbdev_update Propagate change in modules matching `fname` to notebooks that created them
Important Files
settings.ini- You can setup the project config directly from here like description, repository name, author name, etc.index.ipynb- This is the most important notebook. The documentation generated from this notebook becomes part of your ReadMe, PiP, and Conda Description page. This will be the homepage for your documentation.00_core.ipynb- This is an example notebook in which you can write the functions for your library. As pre-written in the notebook, this gets exported tocore.pypython module. You can add more such notebooks, not necessarily with naming conventions like00_but Jeremy Howard suggests using this as this then acts like a journal showing the developer’s thought process.
Directives
#|default_exp <name>: Name of the module where cells with the#|exportdirective will be exported by default.#| export: Exports the items in the cell into the generated module and documentation.#| hide: Hides the code cell from the generated module as well as documentation. Used this in the import statements that need not be part of the generated module.- If you don’t pass any directive to the cell, that will be part of the documentation but not of the generated module.
#|echo: <true|false>: Toggles the visibility of code cell in the documentation. Used this to hide the code cell that embedded the Youtube page in the documentation.#| output: <true|false|asis>: Toggles the visibility of the output from the code cell in the documentation. Used this to hide the print statements output from being part of the documentation#| filter_stream <space separated list of keywords>: Hides the keywords from the output of the code cell. Used this to hide the irritating warnings printed from the code cell when using the sklearn library.
Many more useful directives are available, refer to the documentation for more.
Extra Features
NBDev supports most of the Quarto features. One of them I’ve used in my documentation is the
mermaidflowchart which is very simple to make from a notebook. Refer to Quarto diagrams documentation to use other types of charts as part of your documentation.NBdev supports equations (using Quarto). You can include math in your notebook’s documentation using
$$. Example: \[\sum_{i=1}^{k+1}i\]Useful Jupyter extensions:
- Collapsible headings: This lets you fold and unfold each section in your notebook, based on its markdown headings.
- TOC2: This adds a table of contents to your notebooks, which you can navigate either with the Navigate menu item it adds to your notebooks or the TOC sidebar it adds. These can be modified and/or hidden using its settings.
If you already have a project, then you can simply migrate it to
nbdevusing the library built by Novetta.I haven’t explored this library yet, just adding this for information
Issues I faced while running for the first time
- In the NBDev documentation, it was not written to create an account on Pip and Conda registration page.
settings.iniby default doesn’t create variables for the conda environment which I had to figure out on my own. Manually addconda_uservariable to the config. This name can be either your username or the name of the organization. Like FastAI usesfastai.- I was using
minicondaand didn’t know that I have to install anaconda client for doing the login. Found the solution from the resolved issue innbdevrepository - When you run
nbdev_release_bothit updates the config and bumps the version number. There is some problem with the updated config, that leads to issues in deploying the docs to GitHub pages. To avoid this, don’t upload the updated config, instead, upload the old config with the bumped version number. This is a hacky solution, more investigation needs to be done to identify the exact issue.