Understanding the SageMaker Studio SSH Config Entry

This SSH configuration enables you to SSH directly into your SageMaker Studio space from your local machine, treating it like a remote server. Let me break down each component and explain what’s actually happening.

What This Enables

With this config, you can:

  • SSH into your SageMaker Studio environment: ssh space-name
  • Use VS Code Remote-SSH to develop in Studio
  • Run remote commands: ssh space-name 'python train.py'
  • Forward ports: ssh -L 8888:localhost:8888 space-name
  • Use git with your local SSH keys (via ForwardAgent)

This bypasses the web browser interface entirely.

Line-by-Line Breakdown

Host space-name

Host space-name

What it means: This is the alias you’ll use to connect. You can name it anything.

Example usage:

ssh space-name
# Instead of typing the full connection string

Common practice: Name it descriptively:

Host sagemaker-dev
Host sagemaker-prod
Host ml-workspace

HostName 'arn:PARTITION:sagemaker:...'

HostName 'arn:PARTITION:sagemaker:us-east-1:111122223333:space/domain-id/space-name'

What it means: This is NOT a traditional hostname (like ec2-54-123-45-67.compute.amazonaws.com). It’s an ARN (Amazon Resource Name) identifying your SageMaker Studio space.

ARN structure:

arn:PARTITION:sagemaker:REGION:ACCOUNT_ID:space/DOMAIN_ID/SPACE_NAME

Breaking it down:

  • PARTITION: Usually aws (or aws-cn for China, aws-us-gov for GovCloud)
  • sagemaker: The AWS service
  • us-east-1: AWS region where your Studio domain lives
  • 111122223333: Your AWS account ID (12 digits)
  • space/domain-id/space-name: The specific Studio space resource path

Why an ARN instead of an IP/hostname? SageMaker Studio spaces don’t have public IPs or DNS names. They’re accessed through AWS APIs. The ARN tells the proxy command (next section) exactly which Studio space to connect to.

Real example:

HostName 'arn:aws:sagemaker:eu-west-2:123456789012:space/d-abc123def456/my-ml-workspace'

ProxyCommand '/home/user/sagemaker_connect.sh' '%h'

ProxyCommand '/home/user/sagemaker_connect.sh' '%h'

What it means: This is the magic that makes everything work. SSH doesn’t natively understand how to connect to SageMaker ARNs, so it delegates to a proxy script that handles the AWS-specific connection logic.

How it works:

  1. You run: ssh space-name
  2. SSH reads the config, sees it needs to use a ProxyCommand
  3. SSH executes: /home/user/sagemaker_connect.sh 'arn:aws:sagemaker:...'
  4. The script:
    • Authenticates to AWS (using your credentials/SSO)
    • Calls SageMaker APIs to establish a connection tunnel
    • Creates a bidirectional pipe between SSH and the Studio space
  5. SSH communicates through this pipe as if it were a normal SSH connection

The '%h' parameter:

  • %h is an SSH token that expands to the HostName value
  • So the script receives the ARN as its argument
  • Equivalent to: sagemaker_connect.sh 'arn:aws:sagemaker:us-east-1:111122223333:space/domain-id/space-name'

What’s in the proxy script? The script (sagemaker_connect.sh) typically:

#!/bin/bash
# Simplified conceptual version
 
ARN="$1"
 
# Extract components from ARN
REGION=$(echo "$ARN" | cut -d: -f4)
SPACE_ID=$(echo "$ARN" | cut -d/ -f2)
 
# Use AWS CLI to create presigned URL for connection
aws sagemaker create-presigned-domain-url \
  --domain-id "$DOMAIN_ID" \
  --user-profile-name "$USER_PROFILE" \
  --region "$REGION"
 
# Establish tunnel using AWS Systems Manager Session Manager
# (This is what actually creates the bidirectional connection)
aws ssm start-session \
  --target "$SPACE_ID" \
  --region "$REGION" \
  --document-name AWS-StartSSHSession \
  --parameters portNumber=22

Real AWS script location: SageMaker Studio provides this script. You might download it from:

  • SageMaker Studio console (under “Set up local connection”)
  • AWS provides templates in their documentation
  • It’s often generated automatically when you set up local IDE integration

ForwardAgent yes

ForwardAgent yes

What it means: Your local SSH keys are forwarded to the remote Studio space, allowing git operations and SSH to other systems from within Studio using your local keys without copying them.

Use case:

# You SSH into SageMaker Studio
ssh space-name
 
# Inside Studio, you can now:
git clone git@github.com:yourcompany/private-repo.git
# This works using YOUR local SSH key, even though you're on remote system
 
# Or SSH to another server
ssh your-other-server.com
# Again, using your local key

Security consideration: This is convenient but potentially risky. If the remote system (SageMaker Studio) is compromised, an attacker could use your forwarded keys. Only enable this for trusted environments.

How it works: SSH agent forwarding creates a secure channel back to your local SSH agent. When Studio needs to authenticate (e.g., to GitHub), the request is tunneled back to your laptop, your laptop’s SSH agent performs the authentication, and the result is sent back.


AddKeysToAgent yes

AddKeysToAgent yes

What it means: Automatically add your SSH private key to the SSH agent when you connect, so you don’t have to enter your key passphrase multiple times.

How it works:

  1. First connection asks for your SSH key passphrase (if key is encrypted)
  2. SSH agent stores the unlocked key in memory
  3. Subsequent connections use the cached key

Note: This affects your LOCAL SSH agent, not the remote system. It’s primarily useful if your SSH key has a passphrase.


StrictHostKeyChecking accept-new

StrictHostKeyChecking accept-new

What it means: Controls how SSH handles unknown host keys (the “fingerprint” of the remote system).

The behavior:

  • First connection to new host: Automatically accept and save the host key (no prompt)
  • Subsequent connections: Verify host key matches the saved one
  • If key changes: Reject connection with warning (prevents man-in-the-middle attacks)

Why this is here: SageMaker Studio spaces are ephemeral. When you stop/start a space, it might get a new host key. Without this setting, you’d constantly get prompted:

The authenticity of host 'space-name' can't be established.
ECDSA key fingerprint is SHA256:...
Are you sure you want to continue connecting (yes/no)?

Alternative values:

  • StrictHostKeyChecking yes: Always verify, reject unknown hosts (most secure, most annoying for ephemeral systems)
  • StrictHostKeyChecking no: Never verify, auto-accept everything (insecure, discouraged)
  • StrictHostKeyChecking accept-new: Accept new, verify subsequent (balanced for SageMaker)

Full Context: How It All Works Together

When you run ssh space-name, here’s what happens:

  1. SSH reads ~/.ssh/config
  2. Finds the Host space-name entry
  3. Sees HostName is an ARN (not a normal hostname)
  4. Executes ProxyCommand: /home/user/sagemaker_connect.sh 'arn:aws:...'
  5. The script:
    • Authenticates to AWS using your credentials
    • Calls SageMaker API: “I want to connect to this space”
    • SageMaker API returns connection details
    • Script uses AWS Systems Manager Session Manager to create encrypted tunnel
    • Tunnel acts as stdin/stdout pipe
  6. SSH treats the pipe as a normal SSH connection
  7. SSH authenticates to the Studio space (using SageMaker-managed keys)
  8. Because ForwardAgent yes, your local SSH keys are available in Studio
  9. Because accept-new, you’re not prompted about host key on first connect
  10. You’re now in a shell inside your SageMaker Studio space

Practical Example Setup

Step 1: Get your Space ARN

# List your SageMaker domains
aws sagemaker list-domains --region eu-west-2
 
# List spaces in a domain
aws sagemaker list-spaces \
  --domain-id d-abc123def456 \
  --region eu-west-2
 
# Your ARN format:
# arn:aws:sagemaker:eu-west-2:123456789012:space/d-abc123def456/default-space

Step 2: Download/create the proxy script SageMaker Studio console → “Set up local connection” → Download sagemaker_connect.sh

Or create it using AWS’s template and save to ~/sagemaker_connect.sh:

chmod +x ~/sagemaker_connect.sh

Step 3: Add to ~/.ssh/config

Host sagemaker-studio
  HostName 'arn:aws:sagemaker:eu-west-2:123456789012:space/d-abc123def456/my-space'
  ProxyCommand '/Users/yourname/sagemaker_connect.sh' '%h'
  ForwardAgent yes
  AddKeysToAgent yes
  StrictHostKeyChecking accept-new
  User sagemaker-user

Step 4: Connect

# Make sure you're logged into AWS SSO first
aws sso login --profile your-profile
 
# Then SSH
ssh sagemaker-studio
 
# Or use with VS Code Remote-SSH extension
# Open VS Code → Remote-SSH: Connect to Host → sagemaker-studio

Common Issues and Solutions

”Permission denied (publickey)”

Problem: The proxy script isn’t executable or AWS credentials aren’t configured.

Solution:

chmod +x ~/sagemaker_connect.sh
aws sso login --profile your-profile

“Could not resolve hostname”

Problem: SSH is treating the ARN as a DNS hostname instead of passing it to ProxyCommand.

Solution: Make sure the ARN is quoted in your config:

HostName 'arn:aws:...'  # Correct
HostName arn:aws:...    # Wrong - SSH tries DNS lookup

“Space is not running”

Problem: SageMaker Studio space needs to be started before you can SSH to it.

Solution:

# Start the space via console or CLI
aws sagemaker create-app \
  --domain-id d-abc123def456 \
  --user-profile-name your-profile \
  --app-type JupyterServer \
  --app-name default \
  --region eu-west-2

Proxy script fails silently

Problem: Hard to debug since ProxyCommand errors aren’t verbose.

Solution: Test the script directly:

# Run the script manually to see errors
~/sagemaker_connect.sh 'arn:aws:sagemaker:eu-west-2:123456789012:space/d-abc123def456/my-space'
 
# Or enable SSH verbose mode
ssh -vvv space-name

Security Implications

What this grants:

  • Shell access to your Studio environment
  • Ability to read/write all files in the space
  • Network access from your local machine through Studio (if port forwarding)
  • Your SSH keys available in Studio (via ForwardAgent)

Security best practices:

  1. Protect the proxy script: It contains or accesses your AWS credentials
  2. Use SSO: Avoid long-lived access keys; use aws sso login
  3. Disable ForwardAgent if you don’t need git/SSH from within Studio
  4. Monitor CloudTrail: Track who’s creating SSH connections to your spaces
  5. Use IAM policies: Restrict which users can sagemaker:CreatePresignedDomainUrl

Why SageMaker Uses This Approach

Traditional SSH:

Your laptop → [Internet] → Public IP → EC2 instance

SageMaker Studio SSH:

Your laptop → [ProxyCommand] → AWS API → Private VPC → Studio Space

SageMaker Studio spaces don’t have public IPs or SSH daemons listening on port 22. Instead:

  • Spaces are in private VPCs
  • Access is mediated by AWS APIs with IAM authentication
  • The proxy command translates SSH protocol into SageMaker API calls
  • Systems Manager Session Manager provides the encrypted tunnel

This provides:

  • Better security: No public SSH endpoints to attack
  • IAM integration: Access controlled by AWS IAM, not SSH keys alone
  • Audit logging: All connections logged in CloudTrail
  • No firewall changes: Works through corporate firewalls (uses HTTPS to AWS APIs)

Alternative: Direct VS Code Integration

Instead of manually setting up SSH config, you can use AWS’s VS Code extension:

  1. Install “AWS Toolkit” extension in VS Code
  2. Configure AWS credentials/SSO
  3. Extension automatically generates SSH config and proxy script
  4. Right-click on Studio space → “Connect with VS Code”

This handles all the configuration for you.


This clarifies what’s happening with the SageMaker SSH config. The key insight is that the ARN and proxy script are translating SSH into AWS API calls, allowing you to treat a SageMaker Studio space like a traditional SSH server even though it’s actually a managed container in AWS’s private infrastructure.