Configuration
Configuration File
Section titled “Configuration File”similarity-go supports configuration files in YAML, TOML, and JSON formats. The tool automatically searches for configuration files in the following order:
.similarity-go.yaml.similarity-go.yml.similarity-go.toml.similarity-go.jsonsimilarity-go.yamlsimilarity-go.ymlsimilarity-go.tomlsimilarity-go.json
Configuration Structure
Section titled “Configuration Structure”YAML Example
Section titled “YAML Example”# Analysis settingsanalysis: threshold: 0.8 # Similarity threshold (0.0 - 1.0) min_lines: 5 # Minimum lines to consider for analysis max_file_size: "10MB" # Maximum file size to analyze
# Output settingsoutput: format: "detailed" # Output format: detailed, json, yaml, csv file: "" # Output file (empty = stdout) colors: true # Enable colored output quiet: false # Suppress non-essential output
# File patternspatterns: include: - "**/*.go" exclude: - "**/*_test.go" - "vendor/**" - "*.pb.go" - ".git/**"
# Performance settingsperformance: parallel: 0 # Number of parallel workers (0 = auto) cache_enabled: true # Enable result caching cache_dir: "" # Cache directory (empty = auto) memory_limit: "1GB" # Maximum memory usage
# Algorithm settingsalgorithms: ast_comparison: enabled: true weight: 0.7
token_comparison: enabled: true weight: 0.2
structure_comparison: enabled: true weight: 0.1
# Reporting settingsreporting: group_by: "file" # Group results: file, function, similarity sort_by: "similarity" # Sort by: similarity, file, function show_context: true # Show code context in output context_lines: 3 # Number of context lines to showTOML Example
Section titled “TOML Example”[analysis]threshold = 0.8min_lines = 5max_file_size = "10MB"
[output]format = "detailed"file = ""colors = truequiet = false
[patterns]include = ["**/*.go"]exclude = [ "**/*_test.go", "vendor/**", "*.pb.go", ".git/**"]
[performance]parallel = 0cache_enabled = truecache_dir = ""memory_limit = "1GB"
[algorithms.ast_comparison]enabled = trueweight = 0.7
[algorithms.token_comparison]enabled = trueweight = 0.2
[algorithms.structure_comparison]enabled = trueweight = 0.1
[reporting]group_by = "file"sort_by = "similarity"show_context = truecontext_lines = 3JSON Example
Section titled “JSON Example”{ "analysis": { "threshold": 0.8, "min_lines": 5, "max_file_size": "10MB" }, "output": { "format": "detailed", "file": "", "colors": true, "quiet": false }, "patterns": { "include": ["**/*.go"], "exclude": [ "**/*_test.go", "vendor/**", "*.pb.go", ".git/**" ] }, "performance": { "parallel": 0, "cache_enabled": true, "cache_dir": "", "memory_limit": "1GB" }, "algorithms": { "ast_comparison": { "enabled": true, "weight": 0.7 }, "token_comparison": { "enabled": true, "weight": 0.2 }, "structure_comparison": { "enabled": true, "weight": 0.1 } }, "reporting": { "group_by": "file", "sort_by": "similarity", "show_context": true, "context_lines": 3 }}Configuration Options
Section titled “Configuration Options”Analysis Settings
Section titled “Analysis Settings”| Option | Type | Default | Description |
|---|---|---|---|
threshold | float | 0.8 | Similarity threshold (0.0-1.0). Functions with similarity above this value are reported. |
min_lines | int | 5 | Minimum number of lines for a function to be analyzed. |
max_file_size | string | "10MB" | Maximum file size to analyze. Supports KB, MB, GB suffixes. |
Output Settings
Section titled “Output Settings”| Option | Type | Default | Description |
|---|---|---|---|
format | string | "detailed" | Output format: detailed, json, yaml, csv |
file | string | "" | Output file path. Empty means stdout. |
colors | bool | true | Enable colored output in terminal. |
quiet | bool | false | Suppress non-essential output. |
Pattern Settings
Section titled “Pattern Settings”| Option | Type | Default | Description |
|---|---|---|---|
include | []string | ["**/*.go"] | File patterns to include in analysis. |
exclude | []string | [] | File patterns to exclude from analysis. |
Common exclude patterns:
**/*_test.go- Test filesvendor/**- Vendor dependencies*.pb.go- Protocol buffer generated files.git/**- Git directory**/.*- Hidden files and directories
Performance Settings
Section titled “Performance Settings”| Option | Type | Default | Description |
|---|---|---|---|
parallel | int | 0 | Number of parallel workers. 0 means auto-detect CPU cores. |
cache_enabled | bool | true | Enable caching of analysis results. |
cache_dir | string | "" | Cache directory. Empty means OS-specific cache directory. |
memory_limit | string | "1GB" | Maximum memory usage before triggering garbage collection. |
Algorithm Settings
Section titled “Algorithm Settings”Configure the weight and enablement of different similarity detection algorithms:
AST Comparison
Section titled “AST Comparison”enabled: Enable Abstract Syntax Tree comparisonweight: Relative weight in final similarity score
Token Comparison
Section titled “Token Comparison”enabled: Enable token-level comparisonweight: Relative weight in final similarity score
Structure Comparison
Section titled “Structure Comparison”enabled: Enable structural comparisonweight: Relative weight in final similarity score
Reporting Settings
Section titled “Reporting Settings”| Option | Type | Default | Description |
|---|---|---|---|
group_by | string | "file" | Group results by: file, function, similarity |
sort_by | string | "similarity" | Sort results by: similarity, file, function |
show_context | bool | true | Include code context in detailed output. |
context_lines | int | 3 | Number of context lines to show around matches. |
Environment Variable Override
Section titled “Environment Variable Override”Configuration values can be overridden using environment variables with the prefix SIMILARITY_GO_:
export SIMILARITY_GO_ANALYSIS_THRESHOLD=0.9export SIMILARITY_GO_OUTPUT_FORMAT=jsonexport SIMILARITY_GO_PERFORMANCE_PARALLEL=8Command Line Override
Section titled “Command Line Override”All configuration options can be overridden using command-line flags:
similarity-go analyze ./ \ --threshold 0.9 \ --format json \ --parallel 8 \ --ignore "**/*_test.go"Configuration Validation
Section titled “Configuration Validation”Validate your configuration file:
similarity-go config validateGenerate a default configuration file:
similarity-go config initBest Practices
Section titled “Best Practices”-
Start with defaults: Use the default configuration and adjust as needed.
-
Exclude irrelevant files: Add test files, generated code, and vendor dependencies to the exclude list.
-
Tune the threshold: Start with 0.8 and adjust based on your codebase characteristics.
-
Enable caching: Keep cache enabled for better performance on large projects.
-
Use appropriate output format: Use
detailedfor human review,jsonfor automation. -
Monitor memory usage: Set
memory_limitappropriately for your system.