Summit Health Data

ML Training API Documentation

Complete API Documentation for External Integration

Version 1.0.0 | Last Updated: November 2025

Quick Start

Get started with Summit Health ML Training API in minutes. This guide will help you trigger training processes, monitor progress, and manage billing for third-party users.

Base URL: https://your-backend-server.com
API Version: v1
Content-Type: application/json

Prerequisites

  • API access credentials (API key or OAuth token)
  • Valid billing account for cost allocation
  • Training dataset access (MIMIC-III, MIMIC-IV, PubMed, etc.)

Your First Training Request

curl -X POST "https://your-backend-server.com/api/training/start" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_API_KEY" \ -d '{ "instance_type": "classical", "datasets": "MIMICIII,MIMIC4", "user_id": "external_user_123", "billing_account": "account_abc" }'

Authentication

All API requests require authentication. Summit Health supports two authentication methods:

1. API Key Authentication

Include your API key in the request header:

Authorization: Bearer YOUR_API_KEY

2. OAuth 2.0

For more secure access, use OAuth 2.0:

Authorization: Bearer YOUR_OAUTH_TOKEN
Security Note: Never expose your API keys in client-side code or public repositories. Use environment variables or secure key management systems.

Training API Endpoints

POST /api/training/start

Start a new training job on specified instance type.

Query Parameters

Parameter Type Required Description
instance_type string Optional Instance type: classical, 48vcpu, or trainium. Default: classical
datasets string Optional Comma-separated datasets: MIMICIII, MIMIC4, PubMed. Default: MIMICIII
user_id string Required External user ID for billing allocation
billing_account string Required Billing account ID for cost tracking

Request Example

POST /api/training/start?instance_type=48vcpu&datasets=MIMICIII,MIMIC4 Content-Type: application/json Authorization: Bearer YOUR_API_KEY { "user_id": "external_user_123", "billing_account": "account_abc", "metadata": { "project_name": "Medical NER Model", "description": "Training for production deployment" } }

Response Example

{
  "success": true,
  "message": "Training started successfully on 48 vCPU Instance (3x Faster)",
  "job_id": "training_48vcpu_20251124_152300_12345",
  "model_id": "tinyllama-1b-medical-48vcpu-20251124",
  "process_id": "12345",
  "instance_type": "48vcpu",
  "instance_name": "48 vCPU Instance (3x Faster)",
  "host": "ec2-204-236-243-64.compute-1.amazonaws.com",
  "model_path": "/home/ec2-user/Training_Data/models/tinyllama-1b-medical-phase1",
  "resource_type": "48 vCPU",
  "datasets": ["MIMICIII", "MIMIC4"],
  "status": {
    "current_step": 0,
    "total_steps": 5000,
    "status": "running"
  },
  "progress_detected": true,
  "billing": {
    "estimated_cost": 250.00,
    "estimated_hours": 50,
    "billing_account": "account_abc",
    "user_id": "external_user_123"
  }
}
Training Steps Configuration:

Important: The training is configured with 3 epochs and a max_steps limit of 5,000 steps. In Hugging Face Transformers, when both max_steps and num_train_epochs are specified, max_steps takes precedence, meaning training will stop at 5,000 steps even if 3 full epochs would require more steps.

Why 5,000 steps instead of 30,000?

  • Current Configuration: With effective batch size of 32 and the combined MIMIC-III + MIMIC-IV dataset (~391,360 documents), 5,000 steps may not complete 3 full epochs. This is a conservative limit for cost efficiency.
  • Typical Training: For full 3-epoch training on large medical datasets, 30,000 steps would be more appropriate to ensure the model sees all training data multiple times.
  • Current Limitation: The 5,000 step limit means training may stop before completing 3 epochs, potentially limiting model convergence.

Recommendation: For production medical model training, consider increasing max_steps to 30,000 to ensure full 3-epoch training. This can be configured by modifying the training script or as a custom training parameter. Contact support for assistance with extended training configurations.

Note: The current 5,000 step limit may result in incomplete epoch coverage. For optimal model performance, 30,000 steps (or removing the max_steps limit to allow full 3 epochs) is recommended.
GET /api/training/process-status

Get current status of a training job.

Query Parameters

Parameter Type Required Description
job_id string Required Training job ID returned from start endpoint
instance_type string Optional Instance type filter

Response Example

{
  "job_id": "training_48vcpu_20251124_152300_12345",
  "status": "running",
  "current_step": 2500,
  "total_steps": 5000,
  "progress_percent": 50.0,
  "estimated_time_remaining": "22:30:00",
  "metrics": {
    "train_loss": 0.85,
    "learning_rate": 0.0001
  },
  "billing": {
    "cost_so_far": 125.00,
    "hours_used": 25
  }
}
POST /api/training/stop

Stop a running training job.

Query Parameters

Parameter Type Required Description
instance_type string Optional Instance type: classical, 48vcpu, or trainium
GET /api/training/models/list

List all stored training models with metadata.

Query Parameters

Parameter Type Required Description
instance_type string Optional Filter by instance type
status string Optional Filter by status: running, stopped, completed
limit integer Optional Maximum number of records (default: 100)
POST /api/training/register-completed

Register a completed training model in the database.

Request Body

{ "job_id": "training_48vcpu_20251124_152300_12345", "model_path": "/home/ec2-user/Training_Data/models/tinyllama-1b-medical-phase1", "instance_type": "48vcpu", "final_metrics": { "train_loss": 0.7547, "train_runtime": 161045.76, "epoch": 2.98 } }

Billing & Cost Allocation

Summit Health provides transparent billing with automatic cost allocation to third-party users and billing accounts.

Pricing Structure

Resource Type Pricing Description
Classical CPU $5.00/hour Standard CPU training instances
48 vCPU $7.50/hour High-performance 48-core instances (3x faster)
Trainium $15.00/hour AWS Trainium instances for accelerated training
Base Cost $10.00/job One-time setup cost per training job
Per Epoch $0.25/epoch Additional cost per training epoch
Storage $0.10/GB/month Model storage cost

Cost Calculation Example

Training Job Details: - Instance Type: 48 vCPU - Training Duration: 50 hours - Epochs: 3 - Model Size: 4.2 GB Cost Breakdown: - Base Cost: $10.00 - Compute (50 hours × $7.50): $375.00 - Epochs (3 × $0.25): $0.75 - Storage (4.2 GB × $0.10): $0.42/month Total: $386.17 (plus $0.42/month storage)

Billing Endpoints

GET /api/training/cost-tracking/user-costs

Get cost breakdown for a specific user or billing account.

Query Parameters

Parameter Type Required Description
user_id string Optional Filter by user ID
billing_account string Optional Filter by billing account
start_date string Optional Start date (YYYY-MM-DD)
end_date string Optional End date (YYYY-MM-DD)

Response Example

{
  "user_id": "external_user_123",
  "billing_account": "account_abc",
  "period": {
    "start": "2025-11-01",
    "end": "2025-11-30"
  },
  "costs": {
    "total": 1250.50,
    "training": {
      "compute": 1000.00,
      "base_costs": 50.00,
      "epochs": 15.00
    },
    "storage": 185.50
  },
  "jobs": [
    {
      "job_id": "training_48vcpu_20251124_152300_12345",
      "cost": 386.17,
      "duration_hours": 50,
      "status": "completed"
    }
  ]
}
POST /api/training/cost-tracking/record-cost

Record a cost for billing allocation (automatically called by system).

Request Body

{ "job_id": "training_48vcpu_20251124_152300_12345", "user_id": "external_user_123", "billing_account": "account_abc", "cost_type": "compute", "amount": 375.00, "duration_hours": 50, "metadata": { "instance_type": "48vcpu", "resource_type": "48 vCPU" } }
Automatic Billing: When you start a training job with user_id and billing_account, costs are automatically tracked and allocated. You can query costs at any time using the billing endpoints.

Monitoring & Logs

GET /api/training/logs

Get training logs for a specific job.

Query Parameters

Parameter Type Required Description
job_id string Required Training job ID
lines integer Optional Number of log lines to retrieve (default: 500)
GET /api/training/metrics/history

Get training metrics history for visualization.

Deployment Status

GitHub Repository: https://github.com/your-org/summit-health
Vercel Deployment: Auto-deploy on push to main branch
Backend API: https://your-backend-server.com

Deployment Process

  1. Code Push: Push changes to GitHub main branch
  2. Auto-Deploy: Vercel automatically deploys frontend
  3. Backend Update: Backend API updates require manual deployment or CI/CD pipeline
  4. Verification: Test API endpoints after deployment
Note: API endpoint changes require backend server restart. Frontend changes (HTML/JS) are automatically deployed via Vercel.

Code Examples

Python Example

import requests import time API_BASE_URL = "https://your-backend-server.com" API_KEY = "YOUR_API_KEY" def start_training(user_id, billing_account, datasets="MIMICIII"): """Start a training job""" response = requests.post( f"{API_BASE_URL}/api/training/start", params={ "instance_type": "48vcpu", "datasets": datasets }, headers={ "Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json" }, json={ "user_id": user_id, "billing_account": billing_account } ) return response.json() def check_status(job_id): """Check training job status""" response = requests.get( f"{API_BASE_URL}/api/training/process-status", params={"job_id": job_id}, headers={"Authorization": f"Bearer {API_KEY}"} ) return response.json() def get_costs(user_id, billing_account): """Get cost breakdown""" response = requests.get( f"{API_BASE_URL}/api/training/cost-tracking/user-costs", params={ "user_id": user_id, "billing_account": billing_account }, headers={"Authorization": f"Bearer {API_KEY}"} ) return response.json() # Example usage if __name__ == "__main__": # Start training result = start_training( user_id="external_user_123", billing_account="account_abc", datasets="MIMICIII,MIMIC4" ) job_id = result["job_id"] print(f"Training started: {job_id}") # Monitor progress while True: status = check_status(job_id) print(f"Progress: {status['progress_percent']}%") if status["status"] == "completed": print("Training completed!") break time.sleep(60) # Check every minute # Get final costs costs = get_costs("external_user_123", "account_abc") print(f"Total cost: ${costs['costs']['total']}")

JavaScript/Node.js Example

const axios = require('axios'); const API_BASE_URL = 'https://your-backend-server.com'; const API_KEY = 'YOUR_API_KEY'; async function startTraining(userId, billingAccount, datasets = 'MIMICIII') { const response = await axios.post( `${API_BASE_URL}/api/training/start`, { user_id: userId, billing_account: billingAccount }, { params: { instance_type: '48vcpu', datasets: datasets }, headers: { 'Authorization': `Bearer ${API_KEY}`, 'Content-Type': 'application/json' } } ); return response.data; } async function checkStatus(jobId) { const response = await axios.get( `${API_BASE_URL}/api/training/process-status`, { params: { job_id: jobId }, headers: { 'Authorization': `Bearer ${API_KEY}` } } ); return response.data; } // Example usage (async () => { const result = await startTraining( 'external_user_123', 'account_abc', 'MIMICIII,MIMIC4' ); console.log('Training started:', result.job_id); // Monitor progress const interval = setInterval(async () => { const status = await checkStatus(result.job_id); console.log(`Progress: ${status.progress_percent}%`); if (status.status === 'completed') { clearInterval(interval); console.log('Training completed!'); } }, 60000); // Check every minute })();

cURL Examples

# Start training curl -X POST "https://your-backend-server.com/api/training/start?instance_type=48vcpu&datasets=MIMICIII,MIMIC4" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "user_id": "external_user_123", "billing_account": "account_abc" }' # Check status curl -X GET "https://your-backend-server.com/api/training/process-status?job_id=training_48vcpu_20251124_152300_12345" \ -H "Authorization: Bearer YOUR_API_KEY" # Get costs curl -X GET "https://your-backend-server.com/api/training/cost-tracking/user-costs?user_id=external_user_123&billing_account=account_abc" \ -H "Authorization: Bearer YOUR_API_KEY" # List models curl -X GET "https://your-backend-server.com/api/training/models/list?status=completed&limit=10" \ -H "Authorization: Bearer YOUR_API_KEY"

📞 Support & Contact

For API access, billing questions, or technical support: