Code review

software developers check their code a lot; scientific data analysts and programmers can, too!

Process

The code author documents the context and aims to write efficient, well-commented code that follows agreed-upon lab specifications to the best of their ability.
The code reviewer is ideally done with someone who did not write the code who
- verifies that the code works and produces the intended numbers
- provides feedback on elements including but not limited to the modularity, math and logic, and overall code (e.g., formatting, typos, comments, documentation)
The code author is responsible for implementing feedback

Code review alongside the process (i.e., early!) is more manageable and catches errors along the way.
Small chunks of code review (max 200 lines at a time) can prevent reviewer burnout.
Code review can be done synchronously in a face-to-face meeting with groups of varying sizes; it can also be done asynchronously. Either way, opportunities to ask questions can be helpful.
Software and AI may be helpful to an extent, especially for formatting consistency and errors.
A need to re-create environments may come up. (Link to resources: )
Culture matters. Sharing code can be vulnerable but will reduce errors and improve reproducibility. It is easier in a culture where perfection is not the expectation, but rather kindness.

Functionality *Running the code generates the identical output
- Code implements the intended functionality
Readability
- The code is easy to follow
- The overall script is organized sensibly
- Line by line, the code is logical and accurate
- The code is formatted in agreed-upon ways (or is internally consistent).
- The folder, file, variable names, and outputs are formatted in agreed-upon ways (or are internally consistent) and are names are sufficiently informative
Code structure and design
- The code is modular
- There are no repetitive blocks of code
- Functions and classes are reasonable size and complexity
Error handling
- Code includes proper potential error handling mechanisms
- Logging is implemented for debugging and troubleshooting purposes
Security
- The authentication and authorization mechanisms should implement correctly
Code reuse
- Raw data files are not overwritten or altered in any way
- Dependencies are managed correctly and up-to-date
Testing and coverage
- There are some unit test to check specific functionality
- Do the tests consider potential edge cases?
Documents
- The script is included in the GitHub repository (if using)
- There is a clear top-level description of the script’s purpose
- Functions, methods, and classes have descriptive comments or strings
Performance
- Performance is efficient
- Memory usage is optimized
- Algorithms and data structures are efficient

Code comply with coding standards and guidelines (Derived data) —> some formatting documentation for after code review

Rokem, A. (2024). Ten simple rules for scientific code review.
Link: Manuscript in press, direct link tbd. In the meantime, see https://uwescience.github.io/neuroinformatics/2017/10/08/code-review.html

Pew Research Center Process
Link: https://www.pewresearch.org/decoded/2023/04/05/how-we-review-code-at-pew-research-center/

Swimm Team Link: https://swimm.io/learn/code-reviews/ultimate-10-step-code-review-checklist#Functionality