rmrf: A Safer rm -rf
Table of Contents
We need to make our work lives safer. rmrf is intended to demonstrate how we can build safer tools for operating complex systems and make operators' daily lives more predictable and less stressful.
tldr;
I’ve created a tool called rmrf, a safer version of the well-known and extremely useful command rm -rf. Howver, rm -rf is a tool that with extremeley sharp edges (ask me, I know!) and it can be dangerouse to run, especially in production situations. What’s more, there are many other tools and systms that are just as dangerous. With this all in mind, rmrf is not intended to replace rm -rf; instead, it’s an example of how we could build safer tools when operating complex systems.
Operational Risk
Most of the time, somewhat unbelievably, computers work. Even more surprisingly, complex systems built on top of them work too. However, hidden beneath that veneer of working software is the invisible labour of human operators, holding it all together with proverbial duct tape, eternal vigilence, and lost weekends.
While we develop software in small groups, creating many tests and releasing often, we typically do the opposite in terms of running complex systems. One or two operators work alone at night with no tests or safety harnesses, and release updates only once a year if they’re lucky.
We need to focus more on operational safety.
This is the greatest time in the history of computing to build your own tools and software. Operators now have the ability to be the designers of their own fate.
rmrf
There are stages a workflow can go through to be safer. rmrf implements the following stages. Note that this is not a comprehensive list, and there are many other stages that could be added or altered, depending on the system and the operators.
| Stage | Description |
|---|---|
| No Stage | Initial state when first starting up interactive mode; there is no active plan yet |
| Plan | Define what it is that we want to delete; creates an auditable record of intent |
| Validate | Check the plan against the policies in the particular environment |
| Near-Miss | If an issue is detected before execution; if the user exits ‘rmrf’ before application, a near-miss is recorded for learning |
| Stage | Prepare to apply, in this case it means backing up the files for recovery |
| Apply | Perform the actual deletion |
| Verify | Confirm that the files were deleted, that the operations completed successfully |
| Rollback | If necessary, restore the data from the backup if something went wrong |
| Closeout | Finalize the plan, remove backups or set them to be removed at a future time, and set the lifecycle as complete |
| Learn | Capture notes and insights for future operations and work towards creating a learning organization |

Using rmrf
Here’s an example of using rmrf to delete the /tmp/old-logs directory via the golden path, taking us through all the stages.
Also, if you exit after creating a plan but before applying it, a near-miss is recorded for analysis.
Conclusion
We need to make our operational lives safer, and rmrf shows how we can build safer tools for operating complex systems. There has never been a better time to be an operator and build amazing, safe tools.
Please check out the rmrf repository and give it a try.