Code review
software developers check their code a lot; scientific data analysts and programmers can, too!
Process
- The code author documents the context and aims to write efficient, well-commented code that follows agreed-upon lab specifications to the best of their ability.
- The code reviewer is ideally done with someone who did not write the code who
- verifies that the code works and produces the intended numbers
- provides feedback on elements including but not limited to the modularity, math and logic, and overall code (e.g., formatting, typos, comments, documentation)
- The code author is responsible for implementing feedback
Implementation
- Code review alongside the process (i.e., early!) is more manageable and catches errors along the way.
- Small chunks of code review (max 200 lines at a time) can prevent reviewer burnout.
- Code review can be done synchronously in a face-to-face meeting with groups of varying sizes; it can also be done asynchronously. Either way, opportunities to ask questions can be helpful.
- Software and AI may be helpful to an extent, especially for formatting consistency and errors.
- A need to re-create environments may come up. (Link to resources: )
- Culture matters. Sharing code can be vulnerable but will reduce errors and improve reproducibility. It is easier in a culture where perfection is not the expectation, but rather kindness.
Example checklist
For a code reviewer
- Functionality *Running the code generates the identical output
- Code implements the intended functionality
- Readability
- The code is easy to follow
- The overall script is organized sensibly
- Line by line, the code is logical and accurate
- The code is formatted in agreed-upon ways (or is internally consistent).
- The folder, file, variable names, and outputs are formatted in agreed-upon ways (or are internally consistent) and are names are sufficiently informative
- Code structure and design
- The code is modular
- There are no repetitive blocks of code
- Functions and classes are reasonable size and complexity
- Error handling
- Code includes proper potential error handling mechanisms
- Logging is implemented for debugging and troubleshooting purposes
- Security
- The authentication and authorization mechanisms should implement correctly
- Code reuse
- Raw data files are not overwritten or altered in any way
- Dependencies are managed correctly and up-to-date
- Testing and coverage
- There are some unit test to check specific functionality
- Do the tests consider potential edge cases?
- Documents
- The script is included in the GitHub repository (if using)
- There is a clear top-level description of the script’s purpose
- Functions, methods, and classes have descriptive comments or strings
- Performance
- Performance is efficient
- Memory usage is optimized
- Algorithms and data structures are efficient
After code review
- Code comply with coding standards and guidelines (Derived data) —> some formatting documentation for after code review
Resources
Rokem, A. (2024). Ten simple rules for scientific code review.
Link: Manuscript in press, direct link tbd. In the meantime, see https://uwescience.github.io/neuroinformatics/2017/10/08/code-review.html
Pew Research Center Process
Link: https://www.pewresearch.org/decoded/2023/04/05/how-we-review-code-at-pew-research-center/
Swimm Team Link: https://swimm.io/learn/code-reviews/ultimate-10-step-code-review-checklist#Functionality