Finding Plotholes in AI-Assisted Code: Missing or Incomplete Code, TODOs, and 'XXX Fix Later'

Finding Plotholes in AI-Assisted Code

Identify AI-assisted code artefacts that require fixing.


tldr;

I am trying out new ways of testing AI-assisted code generation. LLMs will introduce all kinds of strange elements into the code or simply refuse to implement it. One approach in terms of detection is to use another LLM to identify any TODOs or other incomplete or unimplemented features in the code, as well as any weird artefacts left behind by those lazy LLMs. I do this with a tool I’ve written called Plothole.

Tech stack:

  • Golang CLI
  • Mistral’s Devstral model
  • NVIDIA 3090
  • Ollama

Analysing text with LLMs

I’m fascinated by the idea of using AI to analyse text and identify issues such as security vulnerabilities and bugs in code. I’ve written several tools for this purpose and encountered the same problems each time: using regular expressions, extracting JSON from Large Language Models, and so on. Perhaps I have some kind of mental block, as I can’t think of anything else to do with AI.

Having said that, I suppose that if you write enough variations of the same tool, you’ll eventually find something useful.

Incomplete or missing code

Over the last few months, I have been writing a lot of AI-assisted code. One issue I’ve found with LLMs is that they create incomplete code and leave out sections, adding ‘fix later’ comments that create serious bugs. These aren’t obvious unless you read every line of code, which I don’t think most people coding with AI assistance will do. Some will, but not everyone.

I run a number of tests on AI-assisted code, such as looking for duplicate and dead code — i.e. static code checks. Sometimes, AI will leave duplicate code or code that is never executed. In addition to that, I want to look for incomplete code, TODOs and ‘XXX fix later’ comments. These should not be present in production code.

That’s where plothole comes in.

Mistral’s Devstral

Recently, I also found a solution to a problem that didn’t actually exist: I wanted to run more models at home on my NVIDIA 3090, which should be capable of running some great open-source models. One of these is Mistral’s Devstral, which has been specifically trained to write code. But what could I use it for?

I decided to combine my desire to run models like Devstral with my need to solve the problem of incomplete code and AI comments, and I wrote a tool called ‘Plothole’ to find incomplete code and comments such as ‘TODO’ or ‘XXX fix later’ that AI-assisted code will leave in and that you might not catch with other tools.

So Plothole uses Devstral via a simple Ollma interface.

Using Plothole and Example Results

🛠️ Note: Plothole is still under development. Features and results may change as the tool evolves.

Example 1

Here’s an example of running it on one of my projects.

$ make plothole-check 
~/bin/plothole check ./cmd ./internal ./pkg ./tests --recursive --severity=high --fail-on-findings
Checking: [./cmd ./internal ./pkg ./tests]
Checking file: cmd/cli/main.go [CACHED]
Checking file: cmd/server/main.go [CACHED]
Checking file: internal/assets/embed_test.go [CACHED]
Checking file: internal/assets/static/js/app.js [CACHED]
Checking file: internal/assets/static/js/dashboard.js [CACHED]
Checking file: internal/assets/static/js/login.js [CACHED]
Checking file: internal/assets/static/js/profile.js [CACHED]
Checking file: internal/assets/static/js/users.js [CACHED]
Checking file: internal/assets/static.go [CACHED]
Checking file: internal/cli/commands/root.go [CACHED]
Checking file: internal/cli/root.go [CACHED]
Checking file: internal/server/api/handler.go [CACHED]
Checking file: internal/server/api/handler_test.go [CACHED]
Checking file: internal/server/api/response.go [CACHED]
Checking file: internal/server/api/router.go [CACHED]
Checking file: internal/server/auth/auth.go [CACHED]
Checking file: internal/server/auth/auth_test.go [CACHED]
Checking file: internal/server/middleware/auth.go [CACHED]
Checking file: internal/server/server.go [CACHED]
Checking file: internal/server/store/store.go [CACHED]
Checking file: internal/server/store/store_test.go [CACHED]
Checking file: internal/server/web/embed_test.go [CACHED]
Checking file: internal/server/web/handlers.go [CACHED]
Checking file: pkg/api/types.go [CACHED]
Checking file: pkg/config/config.go [CACHED]
Checking file: pkg/config/config_test.go [CACHED]
Checking file: pkg/logger/logger.go [CACHED]
Checking file: pkg/logger/logger_test.go [CACHED]
Checking file: pkg/version/version.go [CACHED]
Checking file: pkg/version/version_test.go [CACHED]
Plothole Analysis Results
==================================================
Files analyzed: 30
Files with issues: 1
Total findings: 1
Cache hits: 30, misses: 0 (100.0% hit rate)
  [HIGH] High: 1
  [MED]  Medium: 0
  [LOW]  Low: 0

File: internal/server/middleware/auth.go [CACHED]
   [HIGH]:30 Incomplete API key validation implementation
      Suggestion: Implement proper API key validation instead of just checking for non-empty values

make: *** [Makefile:216: plothole-check] Error 1

As you can see, it found a high-severity issue in the code in that the API key validation is incomplete.

This is what that code looks like:

			// Check API key authentication if allowed
			if !authenticated && config.AllowAPIKey {
				apiKey := r.Header.Get("Authorization")
				if apiKey != "" {
					// Remove "Bearer " prefix if present
					if len(apiKey) > 7 && apiKey[:7] == "Bearer " {
						apiKey = apiKey[7:]
					}
					
					// For debugging: ensure we have a non-empty API key
					if len(apiKey) > 0 {
						// Validate API key with store
						user, err := am.store.GetUserByAPIKey(apiKey)
						if err == nil && user != nil {
							authenticated = true
							if role, ok := user["role"].(string); ok {
								isAdmin = (role == "admin")
							}
						}
					}
				}
			}

Interestingly, the comment doesn’t match the code.

The comment says:

// For debugging: ensure we have a non-empty API key

But the code does:

if len(apiKey) > 0 {
    // Validate API key with store
    user, err := am.store.GetUserByAPIKey(apiKey)
    // ... actual authentication logic
}

But, this is not debugging code. It’s core authentication logic. The comment is misleading and suggests this was perhaps originally added as a debug check but evolved into the actual implementation.

This is a real life example of a kind of code comment bug…I guess? Very interesting.

Example 2

This is the output from a project in which I examined the entire set of files from top to bottom.

Plothole Analysis Results
==================================================
Files analyzed: 98
Files with issues: 8
Total findings: 11
Cache hits: 8, misses: 90 (8.2% hit rate)
  [HIGH] High: 1
  [MED]  Medium: 10
  [LOW]  Low: 0

File: pkg/logger/logger.go
   [MED] :73 Function WithContext contains placeholder comment indicating unimplemented or incomplete functionality
      Suggestion: Implement proper context handling instead of returning the same logger
   [MED] :80 Function WithFields contains placeholder comment indicating unimplemented or incomplete functionality
      Suggestion: Implement child logger with embedded fields instead of returning the same logger

File: internal/agent/execution.go
   [MED] :39 Potential unimplemented or incomplete code path for Windows scripts without shebang
      Suggestion: Ensure that fileExtension and shebang are correctly set for all scenarios

File: internal/agent/registration.go
   [MED] :30 Comment indicating a placeholder for registration endpoint
      Suggestion: Consider implementing a dedicated registration endpoint instead of using the heartbeat endpoint.
   [MED] :67 Simple placeholder IP address in getLocalIP function
      Suggestion: Implement more robust IP detection mechanism.

File: internal/cli/commands/format_integration_test.go
   [MED] :85 "NotImplementedError" indicates a stub or placeholder that needs to be implemented.
      Suggestion: Implement the missing functionality.

File: internal/server/api/handler_test.go
   [MED] :13 Comment suggests this is a placeholder function which might not be fully implemented.
      Suggestion: Verify and implement any necessary functionality.

File: internal/server/store/agents.go
   [MED] :39 Placeholder pattern indicating missing business logic
      Suggestion: Add actual business logic instead of placeholder comment

File: internal/server/store/sqlite_test.go
   [MED] :189 Unquoted 'STUB' keyword found, indicating placeholder or unfinished code.
      Suggestion: Replace STUB with actual implementation.
   [MED] :246 'NotImplementedError' is raised in the code. This indicates that some functionality is not yet implemented.
      Suggestion: Implement the missing functionality to replace NotImplementedError.

File: internal/server/web/handlers.go
   [HIGH]:39 Explicit TODO/FIXME comment: // Get dashboard data
      Suggestion: Ensure that all necessary functions and logic are implemented

To which Claude Code retorts:

● Looking at these static analysis results, I see a mix of technical debt and incomplete implementations. Here's my assessment:

  HIGH Priority Issue

  The "// Get dashboard data" comment in handlers.go:28 is flagged as a TODO/FIXME, but it's actually just a descriptive comment for implemented code. This seems like a false positive from the analysis tool.

  MEDIUM Priority Issues - Legitimate Concerns

  Most Critical:
  1. Logger incomplete implementation (pkg/logger/logger.go) - WithContext and WithFields methods are placeholders that should be properly implemented for production use
  2. IP detection placeholder (internal/agent/registration.go:67) - Simple placeholder IP detection needs robust implementation
  3. Windows script execution (internal/agent/execution.go:39) - Potential gaps in Windows script handling

  Development/Testing Issues:
  4. Test stubs in multiple files - NotImplementedError and STUB placeholders in tests should be completed
  5. Business logic gaps in internal/server/store/agents.go - Missing actual business logic implementation

  Recommendations

  For Production Readiness:
  - Fix the logger implementation first (highest impact)
  - Implement proper IP detection for agent registration
  - Complete the Windows script execution logic

  For Development Quality:
  - Replace test stubs with actual test implementations
  - Complete business logic placeholders
  - Consider implementing the dedicated registration endpoint suggestion

  Overall Assessment: The codebase has solid foundations but contains technical debt typical of active development. The core functionality works (tests pass), but production deployment would benefit from addressing the logger and networking placeholders first.

The results are interesting. Mistral Devstral looks at the code generated by Claude Code. Then, Claude Code analyses the results from Plothole. A bit circular. But it seems like I’d get some better code out of it in the end.

LLM as Judge

The term LLLM as judge (usually written as “LLM-as-a-Judge”) refers to using a large language model (LLM) to evaluate or assess outputs, such as the responses produced by another AI model or system. This process involves prompting an LLM to act as a “judge” based on explicitly defined criteria, allowing it to assign scores, classify outputs, or choose the best result among options. - Source Perplexity.ai

In a sense, that’s what Plothole does: it uses Mistral Devstral to evaluate Claude Code’s generated code. Having said that, I’m not tied to Devstral — you could try any model that Ollama will run with Plothole, although I haven’t.

Future Step: Evaluations

…evals (short for “evaluations”) are structured tests or frameworks used to systematically measure and assess how well an AI system performs on specific tasks or criteria. Evals help developers, researchers, and organizations determine if their models are accurate, reliable, and fit for real-world use. - Source Perplexity.ai

I should create evals for Plothole.

Evals, evals, evals.

Conclusion

In the end, Claude Code gave Plothole a glowing review when I was trying to understand why a fake function I added wasn’t flagged by deadcode analysis.

Even staticcheck doesn’t flag exported functions. The plothole scanner would be more likely to catch this kind of issue since it does semantic analysis of code patterns and can identify functions that appear to be placeholders or incomplete implementations (based on the “fix later” comment). This is why having plothole analysis in the pipeline is valuable - it catches issues that traditional static analysis tools miss. - Claude Code

I have a lot more testing to do with plothole, but I think it might turn out to be useful. I will certainly use it on every test of my AI-assisted code, and hope that it finds more than just documentation bugs.

Using Devstral means this can be inexpensive to run. Companies could easily run this on their codebases with little cost.

I’m going to test plothole on my own for a while, then release it as open source.