# GitLab CI/CD Best Practices (2025)

Comprehensive guide to building secure, efficient, and maintainable GitLab CI/CD pipelines based on current industry standards and GitLab recommendations.

## Table of Contents

1. [Pipeline Performance](#pipeline-performance)
2. [Security Best Practices](#security-best-practices)
3. [Caching Strategy](#caching-strategy)
4. [Artifacts Management](#artifacts-management)
5. [Stage Organization](#stage-organization)
6. [Job Design](#job-design)
7. [Docker Best Practices](#docker-best-practices)
8. [Deployment Strategies](#deployment-strategies)
9. [Monitoring and Observability](#monitoring-and-observability)
10. [Common Pitfalls](#common-pitfalls)

## Pipeline Performance

### 1. Enable FastZip Compression

GitLab's FastZip compression tool is more efficient than the default:

```yaml
variables:
  FF_USE_FASTZIP: "true"
  ARTIFACT_COMPRESSION_LEVEL: "fastest"
  CACHE_COMPRESSION_LEVEL: "fastest"
```

**Impact**: Up to 70% faster artifact handling and reduced network transfer time.

### 2. Optimize Cache Configuration

Use cache policies to avoid unnecessary uploads:

```yaml
cache:
  key:
    files:
      - package-lock.json  # Cache invalidates when this changes
  paths:
    - node_modules/
  policy: pull  # Default: pull-push, use pull for read-only jobs
```

**Best Practice**: Use `pull-push` only in the job that installs dependencies, use `pull` in all other jobs.

### 3. Parallel Job Execution

Run independent jobs in parallel:

```yaml
test_unit:
  stage: test
  script: npm run test:unit

test_integration:
  stage: test
  script: npm run test:integration

lint:
  stage: test
  script: npm run lint
```

All three jobs run simultaneously, reducing total pipeline time.

### 4. Use `needs` for DAG Pipelines

Skip unnecessary stage waiting with Directed Acyclic Graphs (DAG):

```yaml
build:
  stage: build
  script: npm run build

deploy:
  stage: deploy
  needs: ["build"]  # Starts immediately after build completes
  script: deploy.sh
```

**Impact**: Can reduce pipeline time by 30-50% for complex workflows.

### 5. Limit Artifact Size

Only include necessary files in artifacts:

```yaml
artifacts:
  paths:
    - dist/  # Only built files, not source
  expire_in: 1 week  # Auto-cleanup
  exclude:
    - "**/*.map"  # Exclude source maps if not needed
```

## Security Best Practices

### 1. Use Protected Variables for Secrets

**Never** hardcode secrets in `.gitlab-ci.yml`:

```yaml
# ❌ BAD
variables:
  API_KEY: "sk-1234567890"

# ✅ GOOD
script:
  - deploy --api-key $API_KEY  # $API_KEY from CI/CD variables
```

Configure sensitive variables in GitLab UI: **Settings → CI/CD → Variables**
- Check "Masked" to hide in logs
- Check "Protected" to restrict to protected branches

### 2. Pin Docker Image Versions

**Never** use `latest` tags:

```yaml
# ❌ BAD
image: node:latest

# ✅ GOOD
image: node:20.11-alpine  # Specific version
```

**Even Better**: Use SHA digests for immutability:

```yaml
image: node:20.11-alpine@sha256:abc123...
```

### 3. Use Specific Refs for Includes

When including external configurations:

```yaml
# ❌ BAD
include:
  - project: 'my-group/my-project'
    file: '/templates/.gitlab-ci.yml'

# ✅ GOOD
include:
  - project: 'my-group/my-project'
    ref: 'v1.2.3'  # Pinned to specific tag
    file: '/templates/.gitlab-ci.yml'
```

### 4. Enable Security Scanning

Include GitLab's security scanning templates:

```yaml
include:
  - template: Jobs/SAST.gitlab-ci.yml
  - template: Jobs/Dependency-Scanning.gitlab-ci.yml
  - template: Jobs/Secret-Detection.gitlab-ci.yml
  - template: Jobs/Container-Scanning.gitlab-ci.yml

variables:
  AST_ENABLE_MR_PIPELINES: "true"  # Run in MRs
  GITLAB_ADVANCED_SAST_ENABLED: "true"  # Use improved SAST (Ultimate)
```

### 5. Implement Manual Approval for Production

```yaml
deploy_production:
  stage: deploy
  environment:
    name: production
    url: https://example.com
  script:
    - deploy.sh
  only:
    - main
  when: manual  # Requires manual trigger
```

### 6. Use Container Scanning Before Push

Scan images for vulnerabilities before pushing to registry:

```yaml
scan_image:
  stage: test
  script:
    - trivy image --severity HIGH,CRITICAL $IMAGE_NAME:$TAG
    - trivy image --severity CRITICAL --exit-code 1 $IMAGE_NAME:$TAG
```

## Caching Strategy

### Cache vs Artifacts

**Cache**: Speed up subsequent pipelines (dependencies, build cache)
**Artifacts**: Pass files between jobs in the same pipeline

```yaml
# Example: Combining both
install:
  script:
    - npm ci
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - node_modules/
    policy: pull-push
  artifacts:
    paths:
      - node_modules/  # Pass to other jobs
    expire_in: 1 hour

test:
  needs: ["install"]
  script:
    - npm test
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - node_modules/
    policy: pull  # Read-only
```

### Cache Key Strategies

**File-based keys** (recommended):

```yaml
cache:
  key:
    files:
      - Gemfile.lock
      - package-lock.json
    prefix: $CI_COMMIT_REF_SLUG  # Per-branch cache
  paths:
    - vendor/ruby
    - node_modules/
```

### Shared Cache Across Branches

By default, GitLab separates cache between protected and non-protected branches. To share:

```yaml
cache:
  key: shared-cache  # Same key for all branches
  paths:
    - node_modules/
```

**Use Case**: When there's no security reason to separate caches.

### Multiple Caches

Use multiple caches for different dependency types:

```yaml
cache:
  - key:
      files:
        - package-lock.json
    paths:
      - node_modules/
  - key:
      files:
        - requirements.txt
    paths:
      - venv/
```

## Artifacts Management

### Set Expiration Times

Avoid storage bloat:

```yaml
artifacts:
  paths:
    - dist/
  expire_in: 1 week  # Options: 30 mins, 1 hr, 1 day, 1 week, never
```

**Guideline**:
- Development builds: 1-3 days
- Release candidates: 1 week
- Production releases: never (or 1 year)

### Use Artifact Reports

GitLab can parse and display specific artifact types:

```yaml
test:
  script:
    - pytest --junitxml=report.xml --cov-report xml
  artifacts:
    reports:
      junit: report.xml
      coverage_report:
        coverage_format: cobertura
        path: coverage.xml
```

**Benefits**:
- Test results in merge request UI
- Coverage tracking over time
- Failed test identification

### Conditional Artifacts

Only save artifacts when needed:

```yaml
test:
  script:
    - npm test
  artifacts:
    when: on_failure  # Only save on failure for debugging
    paths:
      - screenshots/
      - logs/
```

Options: `on_success`, `on_failure`, `always`

## Stage Organization

### Standard Stage Structure

```yaml
stages:
  - dependencies    # Install/download dependencies
  - quality        # Linting, formatting, static analysis
  - test          # Unit, integration, e2e tests
  - security      # Security scanning
  - build         # Compile, bundle, package
  - deploy        # Deploy to environments
```

### Environment-Specific Stages

For complex deployments:

```yaml
stages:
  - build
  - test
  - deploy_dev
  - deploy_staging
  - deploy_production
  - rollback
```

## Job Design

### Use `before_script` for Setup

```yaml
default:
  before_script:
    - echo "Setting up environment..."
    - export PATH="$PATH:/custom/bin"

test:
  script:
    - npm test  # before_script runs first
```

### Handle Failures Gracefully

```yaml
security_scan:
  script:
    - trivy scan || true  # Continue pipeline
  allow_failure: true  # Don't fail pipeline
```

**Use Cases**:
- Optional security checks
- Performance tests
- Experimental features

### Use `rules` Instead of `only/except`

Modern syntax with more flexibility:

```yaml
# ❌ OLD
deploy:
  only:
    - main
  except:
    - schedules

# ✅ NEW
deploy:
  rules:
    - if: '$CI_COMMIT_BRANCH == "main" && $CI_PIPELINE_SOURCE != "schedule"'
```

### Retry on Transient Failures

```yaml
test:
  script:
    - npm test
  retry:
    max: 2
    when:
      - runner_system_failure
      - stuck_or_timeout_failure
```

## Docker Best Practices

### Docker-in-Docker Configuration

```yaml
build:
  image: docker:24-cli
  services:
    - docker:24-dind
  variables:
    DOCKER_TLS_CERTDIR: "/certs"  # Enable TLS
    DOCKER_DRIVER: overlay2       # Performance
  script:
    - docker build -t $IMAGE .
```

### Use BuildKit for Better Performance

```yaml
variables:
  DOCKER_BUILDKIT: "1"
  BUILDKIT_PROGRESS: "plain"

build:
  script:
    - docker build --cache-from $IMAGE:latest .
```

### Multi-stage Build Caching

```yaml
build:
  script:
    - docker pull $IMAGE:latest || true
    - |
      docker build \
        --cache-from $IMAGE:latest \
        --build-arg BUILDKIT_INLINE_CACHE=1 \
        --tag $IMAGE:$TAG \
        .
    - docker push $IMAGE:$TAG
```

## Deployment Strategies

### Blue-Green Deployment

```yaml
deploy_green:
  script:
    - deploy to green environment
    - run smoke tests
    - switch traffic to green
    - keep blue as backup

rollback:
  script:
    - switch traffic back to blue
  when: manual
```

### Canary Deployment

```yaml
deploy_canary:
  script:
    - deploy to 10% of servers
    - monitor metrics for 10 minutes
  when: manual

deploy_full:
  needs: ["deploy_canary"]
  script:
    - deploy to remaining 90%
  when: manual
```

### Environment-Specific Variables

```yaml
deploy:
  script:
    - echo "Deploying to $CI_ENVIRONMENT_NAME"
  environment:
    name: $ENV_NAME
    url: https://$ENV_NAME.example.com
  rules:
    - if: '$ENV_NAME == "staging"'
      variables:
        REPLICAS: "2"
    - if: '$ENV_NAME == "production"'
      variables:
        REPLICAS: "5"
```

## Monitoring and Observability

### Coverage Tracking

```yaml
test:
  script:
    - pytest --cov
  coverage: '/TOTAL.*\s+(\d+%)$/'  # Extract coverage percentage
```

GitLab displays coverage trend in project overview.

### Pipeline Duration Alerts

Monitor slow pipelines:

```yaml
check_duration:
  stage: .post  # Special stage that runs last
  script:
    - |
      if [ $CI_PIPELINE_DURATION -gt 1800 ]; then
        echo "Pipeline took longer than 30 minutes!"
        # Send alert
      fi
  when: always
```

### Job Logs Best Practices

```yaml
build:
  script:
    - echo "Starting build at $(date)"
    - |
      set -x  # Echo commands for debugging
      npm run build
    - echo "Build completed at $(date)"
```

## Common Pitfalls

### ❌ Pitfall 1: No Cache Invalidation

```yaml
# Problem: Cache never updates
cache:
  key: "static-key"
  paths:
    - node_modules/

# Solution: Use file-based keys
cache:
  key:
    files:
      - package-lock.json
  paths:
    - node_modules/
```

### ❌ Pitfall 2: Overusing Artifacts

```yaml
# Problem: 5GB of artifacts per pipeline
artifacts:
  paths:
    - /  # Everything!

# Solution: Be specific
artifacts:
  paths:
    - dist/
    - build/output.jar
  expire_in: 1 week
```

### ❌ Pitfall 3: Long-Running Jobs

```yaml
# Problem: 2-hour test job
test:
  script:
    - run_all_tests

# Solution: Split into parallel jobs
test:unit:
  script: run_unit_tests
test:integration:
  script: run_integration_tests
test:e2e:
  script: run_e2e_tests
```

### ❌ Pitfall 4: Not Using `needs`

```yaml
# Problem: deploy waits for all test jobs
stages: [test, deploy]

test:unit: ...
test:integration: ...
test:performance: ...  # Takes 30 mins

deploy:
  stage: deploy  # Waits for performance tests

# Solution: Use needs
deploy:
  needs: ["test:unit", "test:integration"]
  # Doesn't wait for performance tests
```

### ❌ Pitfall 5: Ignoring Failed Jobs

```yaml
# Problem: Security scan fails but pipeline succeeds
security:
  script: security_scan
  allow_failure: true  # Ignored!

# Solution: Review failures, set policies
security:
  script: security_scan
  allow_failure: false  # Or create separate dashboard
```

### ❌ Pitfall 6: Hardcoded Values

```yaml
# Problem: Different values per environment
script:
  - deploy --url https://staging.example.com

# Solution: Use variables
script:
  - deploy --url $DEPLOY_URL
```

### ❌ Pitfall 7: Missing `only/rules`

```yaml
# Problem: Tests run on every commit to any branch
deploy:
  script: deploy.sh

# Solution: Restrict to specific branches
deploy:
  script: deploy.sh
  only:
    - main
```

## Performance Benchmarks

Based on 2025 industry standards:

| Metric | Good | Needs Improvement |
|--------|------|-------------------|
| Pipeline Duration | < 10 min | > 20 min |
| Cache Hit Rate | > 80% | < 50% |
| Artifact Size | < 100 MB | > 500 MB |
| Test Coverage | > 80% | < 60% |
| Failed Pipeline Rate | < 5% | > 15% |

## References

- [GitLab CI/CD Pipeline Efficiency](https://docs.gitlab.com/ee/ci/pipelines/pipeline_efficiency.html)
- [GitLab Security Best Practices](https://docs.gitlab.com/ci/pipeline_security/)
- [GitLab Caching Documentation](https://docs.gitlab.com/ee/ci/caching/)
- [Docker BuildKit Documentation](https://docs.docker.com/build/buildkit/)
- [OWASP CI/CD Security Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/CI_CD_Security_Cheat_Sheet.html)

---

**Last Updated**: January 2025
**GitLab Version**: 16.8+
