Skip to content

Configuration

similarity-go supports configuration files in YAML, TOML, and JSON formats. The tool automatically searches for configuration files in the following order:

  1. .similarity-go.yaml
  2. .similarity-go.yml
  3. .similarity-go.toml
  4. .similarity-go.json
  5. similarity-go.yaml
  6. similarity-go.yml
  7. similarity-go.toml
  8. similarity-go.json
.similarity-go.yaml
# Analysis settings
analysis:
threshold: 0.8 # Similarity threshold (0.0 - 1.0)
min_lines: 5 # Minimum lines to consider for analysis
max_file_size: "10MB" # Maximum file size to analyze
# Output settings
output:
format: "detailed" # Output format: detailed, json, yaml, csv
file: "" # Output file (empty = stdout)
colors: true # Enable colored output
quiet: false # Suppress non-essential output
# File patterns
patterns:
include:
- "**/*.go"
exclude:
- "**/*_test.go"
- "vendor/**"
- "*.pb.go"
- ".git/**"
# Performance settings
performance:
parallel: 0 # Number of parallel workers (0 = auto)
cache_enabled: true # Enable result caching
cache_dir: "" # Cache directory (empty = auto)
memory_limit: "1GB" # Maximum memory usage
# Algorithm settings
algorithms:
ast_comparison:
enabled: true
weight: 0.7
token_comparison:
enabled: true
weight: 0.2
structure_comparison:
enabled: true
weight: 0.1
# Reporting settings
reporting:
group_by: "file" # Group results: file, function, similarity
sort_by: "similarity" # Sort by: similarity, file, function
show_context: true # Show code context in output
context_lines: 3 # Number of context lines to show
.similarity-go.toml
[analysis]
threshold = 0.8
min_lines = 5
max_file_size = "10MB"
[output]
format = "detailed"
file = ""
colors = true
quiet = false
[patterns]
include = ["**/*.go"]
exclude = [
"**/*_test.go",
"vendor/**",
"*.pb.go",
".git/**"
]
[performance]
parallel = 0
cache_enabled = true
cache_dir = ""
memory_limit = "1GB"
[algorithms.ast_comparison]
enabled = true
weight = 0.7
[algorithms.token_comparison]
enabled = true
weight = 0.2
[algorithms.structure_comparison]
enabled = true
weight = 0.1
[reporting]
group_by = "file"
sort_by = "similarity"
show_context = true
context_lines = 3
{
"analysis": {
"threshold": 0.8,
"min_lines": 5,
"max_file_size": "10MB"
},
"output": {
"format": "detailed",
"file": "",
"colors": true,
"quiet": false
},
"patterns": {
"include": ["**/*.go"],
"exclude": [
"**/*_test.go",
"vendor/**",
"*.pb.go",
".git/**"
]
},
"performance": {
"parallel": 0,
"cache_enabled": true,
"cache_dir": "",
"memory_limit": "1GB"
},
"algorithms": {
"ast_comparison": {
"enabled": true,
"weight": 0.7
},
"token_comparison": {
"enabled": true,
"weight": 0.2
},
"structure_comparison": {
"enabled": true,
"weight": 0.1
}
},
"reporting": {
"group_by": "file",
"sort_by": "similarity",
"show_context": true,
"context_lines": 3
}
}
OptionTypeDefaultDescription
thresholdfloat0.8Similarity threshold (0.0-1.0). Functions with similarity above this value are reported.
min_linesint5Minimum number of lines for a function to be analyzed.
max_file_sizestring"10MB"Maximum file size to analyze. Supports KB, MB, GB suffixes.
OptionTypeDefaultDescription
formatstring"detailed"Output format: detailed, json, yaml, csv
filestring""Output file path. Empty means stdout.
colorsbooltrueEnable colored output in terminal.
quietboolfalseSuppress non-essential output.
OptionTypeDefaultDescription
include[]string["**/*.go"]File patterns to include in analysis.
exclude[]string[]File patterns to exclude from analysis.

Common exclude patterns:

  • **/*_test.go - Test files
  • vendor/** - Vendor dependencies
  • *.pb.go - Protocol buffer generated files
  • .git/** - Git directory
  • **/.* - Hidden files and directories
OptionTypeDefaultDescription
parallelint0Number of parallel workers. 0 means auto-detect CPU cores.
cache_enabledbooltrueEnable caching of analysis results.
cache_dirstring""Cache directory. Empty means OS-specific cache directory.
memory_limitstring"1GB"Maximum memory usage before triggering garbage collection.

Configure the weight and enablement of different similarity detection algorithms:

  • enabled: Enable Abstract Syntax Tree comparison
  • weight: Relative weight in final similarity score
  • enabled: Enable token-level comparison
  • weight: Relative weight in final similarity score
  • enabled: Enable structural comparison
  • weight: Relative weight in final similarity score
OptionTypeDefaultDescription
group_bystring"file"Group results by: file, function, similarity
sort_bystring"similarity"Sort results by: similarity, file, function
show_contextbooltrueInclude code context in detailed output.
context_linesint3Number of context lines to show around matches.

Configuration values can be overridden using environment variables with the prefix SIMILARITY_GO_:

Terminal window
export SIMILARITY_GO_ANALYSIS_THRESHOLD=0.9
export SIMILARITY_GO_OUTPUT_FORMAT=json
export SIMILARITY_GO_PERFORMANCE_PARALLEL=8

All configuration options can be overridden using command-line flags:

Terminal window
similarity-go analyze ./ \
--threshold 0.9 \
--format json \
--parallel 8 \
--ignore "**/*_test.go"

Validate your configuration file:

Terminal window
similarity-go config validate

Generate a default configuration file:

Terminal window
similarity-go config init
  1. Start with defaults: Use the default configuration and adjust as needed.

  2. Exclude irrelevant files: Add test files, generated code, and vendor dependencies to the exclude list.

  3. Tune the threshold: Start with 0.8 and adjust based on your codebase characteristics.

  4. Enable caching: Keep cache enabled for better performance on large projects.

  5. Use appropriate output format: Use detailed for human review, json for automation.

  6. Monitor memory usage: Set memory_limit appropriately for your system.