Pre-Commit¶
Git pre-commit hooks are client side events that are raised and handled by simple scripts whenever a commit is made to a git
repository.
Simple pre-commit scripts were originally based around executable shell scripts placed in a hidden .git/hooks/pre-commit
file inside a git
repository. These scripts would be invoked whenever a commit was made into the local repository
Writing pre-commit scripts can be a complicated affair, but the pre-commit
Python package provides a much simpler and more powerful framework for writing and installing your own automation scripts.
Motivation¶
One of the problems associated with running quality assurance tools over a set of files is that it’s easy to not get round to it.
In a live editing environment, if the editing tool provides “live” error checking, such as code linting or inline spell-checking, you are often more likely to pick up on issues at the time they are created. However, when errors do make it through to the saved version of a file, as they inevitably will, they can often build up over time. When a quality process is invoked, there may be a significant backlog of issues. What’s worse is that with multiple errors or issues in a document, whether it’s a text document or a code file, fixing one may have consequential side effects on others, or on the rest of the document.
If files are being managed under a version control system such as git
, files are often committed to the repository on a regular basis. In some workflows, each substantive change may be committed as a separate check-in or commit with a comment describing the nature of the change. In other workflows, large numbers of undistinguished and independent changes may be committed at the same time. The latter approach may be less than useful when trying to track down a very specific change, but it comes without the overhead of trying to comment on each particular change, each with its own change note, at the time the change is made.
In workflows producing text documents, it may be unclear as to what makes a sensible commit chunk. In such a case, commits may be made at natural breaks in the workflow, such as at the end of a particular section of the document, or at the end a particular work session, such as at the end of the day.
Check-ins may also reflect the completion of a particular task, such as completing the editing a particular section, or running a spell-checker over a complete document.
Pre-commit Tasks¶
Pre-commit workflows are established by running a pre-commit install
command to configure the .git/hooks/pre-commit
file. The pre-commit tasks themselves are defined via a .pre-commit-config.yaml
configuration file. This file specifies what tasks should be performed on the set of files that are submitted as part of the commit process.
The first time a commit is made using a particular action, there may be some delay as the code implementing the action is downloaded from the specified action repository and then cached locally.
Example markdownlint
pre-commit Action¶
We can use a simple GitHub Action to check a Markdown file using the markdownlint-cli
:
- repo: https://github.com/igorshubovych/markdownlint-cli
rev: v0.30.0
hooks:
- id: markdownlint
args:
[
"--disable=MD013",
"--disable=MD047"
]
Files to be ignored should be specified using .gitignore
style paths in a .markdownlintignore
file.
Alternatively, using the Ruby markdownlint
package.
- repo: https://github.com/markdownlint/markdownlint
rev: v0.11.0
hooks:
- id: markdownlint
name: Markdownlint
description: Run markdownlint on your Markdown files
entry: mdl -r ~MD013,~MD026
language: ruby
files: \.(md|mdown|markdown|myst)$
In this case, specific lint error codes can be ignored by passing them prefixed with a ~
character via the mdl
command-line command -r
switch.
Example codespell
pre-commit Action¶
The following example of a .pre-commit-config.yaml
file shows how to define a codespell
pre-commit action that will spell check all committed files.
repos:
- repo: https://github.com/codespell-project/codespell
rev: v2.1.0 # CURRENT_TAG/COMMIT_HASH
hooks:
- id: codespell
name: codespell
description: Checks for common misspellings in text files.
entry: codespell --skip="*.js,*.html,*.css, *.svg" --ignore-words=.codespell-ignore.txt
language: python
types: [text]
Note that for this action, the commit is prevented if the spell-check fails.
The output report shows which files contained detected spelling errors, and what those errors were. It is then up to the user to fix those errors before trying to commit the file again.
Note
This idea of blocking a commit until the error is addressed is reminiscent of the jidoka idea in the Toyota Production System / lean manufacturing process.
Example nbqa black
Example¶
The nbqa
repository provides many examples of how to configure pre-commit
to apply various nbqa
actions over checked in notebooks.
For example, the following configuration example shows how to use nbqa
to apply the black
formatter to notebook code cells:
- repo: https://github.com/nbQA-dev/nbQA
rev: 1.1.1
hooks:
- id: nbqa-black
name: nbqa-black
description: "Run 'black' on a Jupyter Notebook"
entry: nbqa black
language: python
require_serial: true
types: [jupyter]
additional_dependencies: [black]
Here’s an example of the report that is generated by trying to commit a notebook file that contains at least one code cell that black
has seen fit to correct:
The commit is halted as the black
formatter runs over the file that we initially committed. The file is updated as a result of the failed commit action and if we now try to commit it again, it will pass.
Note that under this workflow, we take it on trust that the black
formatter has updated the code cell(s) (on our behalf) in a way that we approve of, but aren’t necessarily able to bear witness to.
Danger
The user experience using the GitHub application and the jupytext
commit hook is horrible, and virtually unusable, compared to the original, and now deprecated, jupytext --pre-commit
workflow.
In that original case, evoked using a simple jupytext --from ipynb --to .md//markdown --pre-commit
line in a .git/hooks/pre-commit
file, paired notebook files would be automatically synched if either one as committed.
Using the pre-commit
framework, trying to commit files in the Github application seems to be to be a painful, confusing and unreliable experience. (I may be doing something wrong, of course…)
I get the feeling this a wont-fix
because the pre-commit
framework folk think everyone should be typing git
commands on the command line whenever they use git
…
So I just hope that the old --pre-commit
jupytext flag continues to work because the approved pre-commit route doesn’t work in any useable sense at all for me.