Configuration

Configuration File

similarity-go supports configuration files in YAML, TOML, and JSON formats. The tool automatically searches for configuration files in the following order:

.similarity-go.yaml
.similarity-go.yml
.similarity-go.toml
.similarity-go.json
similarity-go.yaml
similarity-go.yml
similarity-go.toml
similarity-go.json

Configuration Structure

YAML Example

# Analysis settings
analysis:
  threshold: 0.8              # Similarity threshold (0.0 - 1.0)
  min_lines: 5               # Minimum lines to consider for analysis
  max_file_size: "10MB"      # Maximum file size to analyze

# Output settings
output:
  format: "detailed"         # Output format: detailed, json, yaml, csv
  file: ""                   # Output file (empty = stdout)
  colors: true               # Enable colored output
  quiet: false               # Suppress non-essential output

# File patterns
patterns:
  include:
    - "**/*.go"
  exclude:
    - "**/*_test.go"
    - "vendor/**"
    - "*.pb.go"
    - ".git/**"

# Performance settings
performance:
  parallel: 0                # Number of parallel workers (0 = auto)
  cache_enabled: true        # Enable result caching
  cache_dir: ""              # Cache directory (empty = auto)
  memory_limit: "1GB"        # Maximum memory usage

# Algorithm settings
algorithms:
  ast_comparison:
    enabled: true
    weight: 0.7

  token_comparison:
    enabled: true
    weight: 0.2

  structure_comparison:
    enabled: true
    weight: 0.1

# Reporting settings
reporting:
  group_by: "file"           # Group results: file, function, similarity
  sort_by: "similarity"      # Sort by: similarity, file, function
  show_context: true         # Show code context in output
  context_lines: 3           # Number of context lines to show

TOML Example

[analysis]
threshold = 0.8
min_lines = 5
max_file_size = "10MB"

[output]
format = "detailed"
file = ""
colors = true
quiet = false

[patterns]
include = ["**/*.go"]
exclude = [
  "**/*_test.go",
  "vendor/**",
  "*.pb.go",
  ".git/**"
]

[performance]
parallel = 0
cache_enabled = true
cache_dir = ""
memory_limit = "1GB"

[algorithms.ast_comparison]
enabled = true
weight = 0.7

[algorithms.token_comparison]
enabled = true
weight = 0.2

[algorithms.structure_comparison]
enabled = true
weight = 0.1

[reporting]
group_by = "file"
sort_by = "similarity"
show_context = true
context_lines = 3

JSON Example

{
  "analysis": {
    "threshold": 0.8,
    "min_lines": 5,
    "max_file_size": "10MB"
  },
  "output": {
    "format": "detailed",
    "file": "",
    "colors": true,
    "quiet": false
  },
  "patterns": {
    "include": ["**/*.go"],
    "exclude": [
      "**/*_test.go",
      "vendor/**",
      "*.pb.go",
      ".git/**"
    ]
  },
  "performance": {
    "parallel": 0,
    "cache_enabled": true,
    "cache_dir": "",
    "memory_limit": "1GB"
  },
  "algorithms": {
    "ast_comparison": {
      "enabled": true,
      "weight": 0.7
    },
    "token_comparison": {
      "enabled": true,
      "weight": 0.2
    },
    "structure_comparison": {
      "enabled": true,
      "weight": 0.1
    }
  },
  "reporting": {
    "group_by": "file",
    "sort_by": "similarity",
    "show_context": true,
    "context_lines": 3
  }
}

Configuration Options

Analysis Settings

Option	Type	Default	Description
`threshold`	`float`	`0.8`	Similarity threshold (0.0-1.0). Functions with similarity above this value are reported.
`min_lines`	`int`	`5`	Minimum number of lines for a function to be analyzed.
`max_file_size`	`string`	`"10MB"`	Maximum file size to analyze. Supports KB, MB, GB suffixes.

Output Settings

Option	Type	Default	Description
`format`	`string`	`"detailed"`	Output format: `detailed`, `json`, `yaml`, `csv`
`file`	`string`	`""`	Output file path. Empty means stdout.
`colors`	`bool`	`true`	Enable colored output in terminal.
`quiet`	`bool`	`false`	Suppress non-essential output.

Pattern Settings

Option	Type	Default	Description
`include`	`[]string`	`["*/.go"]`	File patterns to include in analysis.
`exclude`	`[]string`	`[]`	File patterns to exclude from analysis.

Common exclude patterns:

**/*_test.go - Test files
vendor/** - Vendor dependencies
*.pb.go - Protocol buffer generated files
.git/** - Git directory
**/.* - Hidden files and directories

Performance Settings

Option	Type	Default	Description
`parallel`	`int`	`0`	Number of parallel workers. 0 means auto-detect CPU cores.
`cache_enabled`	`bool`	`true`	Enable caching of analysis results.
`cache_dir`	`string`	`""`	Cache directory. Empty means OS-specific cache directory.
`memory_limit`	`string`	`"1GB"`	Maximum memory usage before triggering garbage collection.

Algorithm Settings

Configure the weight and enablement of different similarity detection algorithms:

AST Comparison

enabled: Enable Abstract Syntax Tree comparison
weight: Relative weight in final similarity score

Token Comparison

enabled: Enable token-level comparison
weight: Relative weight in final similarity score

Structure Comparison

enabled: Enable structural comparison
weight: Relative weight in final similarity score

Reporting Settings

Option	Type	Default	Description
`group_by`	`string`	`"file"`	Group results by: `file`, `function`, `similarity`
`sort_by`	`string`	`"similarity"`	Sort results by: `similarity`, `file`, `function`
`show_context`	`bool`	`true`	Include code context in detailed output.
`context_lines`	`int`	`3`	Number of context lines to show around matches.

Environment Variable Override

Configuration values can be overridden using environment variables with the prefix SIMILARITY_GO_:

export SIMILARITY_GO_ANALYSIS_THRESHOLD=0.9
export SIMILARITY_GO_OUTPUT_FORMAT=json
export SIMILARITY_GO_PERFORMANCE_PARALLEL=8

Command Line Override

All configuration options can be overridden using command-line flags:

similarity-go analyze ./ \
  --threshold 0.9 \
  --format json \
  --parallel 8 \
  --ignore "**/*_test.go"

Configuration Validation

Validate your configuration file:

similarity-go config validate

Generate a default configuration file:

similarity-go config init

Best Practices

Start with defaults: Use the default configuration and adjust as needed.
Exclude irrelevant files: Add test files, generated code, and vendor dependencies to the exclude list.
Tune the threshold: Start with 0.8 and adjust based on your codebase characteristics.
Enable caching: Keep cache enabled for better performance on large projects.
Use appropriate output format: Use detailed for human review, json for automation.
Monitor memory usage: Set memory_limit appropriately for your system.