Instances Management Guide v1.0.0

Learn advanced techniques and best practices for managing your compute instances on the SLYD platform.

About This Guide

This guide expands on the Compute Instances documentation to provide deeper insights, advanced management techniques, and real-world usage scenarios. It's designed for users who are already familiar with the basics of SLYD compute instances.

Instance Lifecycle Management

Effectively managing the lifecycle of your compute instances helps optimize costs and ensure resources are available when needed.

Scheduling

Set up automatic start and stop schedules for predictable workloads to reduce costs during idle periods.

Schedule Creation via CLI
slyd instance schedule create \
    --instance-id inst-abc123 \
    --name "workday-hours" \
    --start "0 8 * * 1-5" \
    --stop "0 18 * * 1-5" \
    --timezone "America/New_York"

You can also create schedules through the web interface:

1

Navigate to your instance details page

2

Select the "Scheduling" tab

3

Click "Create Schedule" and configure your desired time parameters

4

Choose recurrence pattern (daily, weekdays only, custom)

Time-to-Live (TTL)

Set expiration times for temporary instances to ensure they don't run longer than needed.

Common Use Case:

Development and testing instances that are only needed for a specific duration, such as a two-week sprint or a three-day hackathon.

Setting TTL via CLI
slyd instance update \
    --instance-id inst-abc123 \
    --ttl "2022-12-31T23:59:59Z"

When the TTL is reached, the instance will automatically be stopped (not terminated). You'll receive a notification, allowing you to either extend the TTL or terminate the instance.

Auto-scaling

Configure your instances to automatically scale based on metrics like CPU usage, memory utilization, or custom application metrics.

Horizontal Scaling
Automatically add or remove identical instances as demand changes
Vertical Scaling
Automatically increase or decrease the resources (CPU, RAM) assigned to an instance
Example Configuration:
  • Scale Out: Add instance when CPU > 75% for 5 minutes
  • Scale In: Remove instance when CPU < 25% for 15 minutes
  • Min Instances: 2
  • Max Instances: 10
Auto-scaling is particularly effective for web applications, API servers, and batch processing workloads with variable demand.

Data Management

Proper data management ensures your information remains persistent, secure, and optimized.

Storage Volumes

Separate your data from your compute instances for better persistence, flexibility, and performance.

SSD Volumes

High-performance solid-state storage ideal for databases, applications requiring low latency, and random I/O operations.

  • Performance: Up to 10,000 IOPS
  • Size Range: 10GB - 4TB
  • Use Cases: Databases, high-traffic web applications
HDD Volumes

Cost-effective storage for large datasets and sequential access patterns.

  • Performance: Up to 500 IOPS
  • Size Range: 100GB - 16TB
  • Use Cases: Log storage, media files, backups
1

Create a Volume

From your dashboard, go to Storage > Volumes > Create Volume

2

Specify Volume Details

Choose type, size, and whether to encrypt the volume

3

Attach to Instance

Select which instance to attach the volume to and specify a mount point

4

Format and Mount (first-time only)

If it's a new volume, format it and configure your system to mount it automatically

Format and Mount Example
# Format the volume with ext4
sudo mkfs -t ext4 /dev/xvdf

# Create a mount point
sudo mkdir /data

# Mount the volume
sudo mount /dev/xvdf /data

# Add to fstab for auto-mount on reboot
echo '/dev/xvdf /data ext4 defaults,nofail 0 2' | sudo tee -a /etc/fstab

Backup Strategies

Implement a comprehensive backup strategy to protect your data from loss, corruption, or accidental deletion.

Snapshots

Point-in-time copies of your volumes that can be used to create new volumes or restore existing ones.

Incremental after first snapshot
Fast restoration
Can be scheduled
Backup Service

Full-featured backup service with compression, encryption, and long-term retention options.

Application-consistent backups
Differential and incremental options
Cross-region replication

Backup Best Practices

  • Use the 3-2-1 rule: 3 copies of data, 2 different media types, 1 off-site backup
  • Test your backup restoration process regularly
  • Automate backup verification to ensure backups are valid
  • Implement appropriate retention policies for different data types
  • Document your backup and recovery procedures

Instance Imaging

Create custom images of your instances to capture their configuration, installed software, and data for reuse and replication.

1

Prepare Your Instance

Clean up temporary files, remove sensitive information, and ensure the instance is in a consistent state

2

Create the Image

From the instance dashboard, select "Create Image" and provide image details

3

Wait for Completion

The imaging process can take several minutes depending on the instance size

4

Launch New Instances

Use your custom image to launch new instances with identical configurations

Common Use Cases

Environment replication for development, testing, and production
Standardized images for team onboarding
Template instances for horizontal scaling
Golden images for disaster recovery

Advanced Networking

Configure sophisticated networking capabilities to enhance security, performance, and connectivity.

Security Groups

Virtual firewalls that control inbound and outbound traffic to your instances, providing an additional layer of security.

Example Security Group: Web Server

Type Protocol Port Range Source/Destination Description
Inbound TCP 80, 443 0.0.0.0/0 HTTP/HTTPS traffic
Inbound TCP 22 192.168.1.0/24 SSH from internal network
Outbound All All 0.0.0.0/0 Allow all outbound traffic
1

Create a Security Group

Navigate to Networking > Security Groups > Create Security Group

2

Define Rules

Add inbound and outbound rules with protocols, ports, and source/destination

3

Assign to Instances

Attach the security group to existing instances or apply during instance creation

Security Best Practices

  • Follow the principle of least privilege when creating rules
  • Use specific IP ranges rather than 0.0.0.0/0 when possible
  • Create purpose-specific security groups (e.g., web, database) and apply them as needed
  • Regularly audit and update security group rules

Virtual Private Networks (VPNs)

Connect your on-premises network to your SLYD environment securely over the internet.

Site-to-Site VPN

Connect your entire corporate network to your SLYD environment, allowing seamless access between resources in both networks.

Compatible VPN device or software at your location
Static public IP address
BGP routing (for dynamic routing)
Client VPN

Allow individual users to connect securely to your SLYD environment from anywhere.

OpenVPN-compatible client software
Authentication credentials
Client configuration file

Setting Up a VPN Connection

1

Navigate to Networking > VPN Connections > Create VPN Connection

2

Select VPN type and configure connection parameters

3

Download the configuration file for your local VPN device or software

4

Apply the configuration and establish the connection

Private Networking

Create isolated network environments for your instances to enhance security and control communication.

Private Network Architecture

Key Components

Virtual Private Cloud (VPC)
Isolated network environment within SLYD
Subnets
Network segments within a VPC with their own IP address ranges
Route Tables
Define how network traffic is directed within and outside the VPC
Network ACLs
Stateless network-level firewall controls for subnets

Common Network Architectures

Public-Private Web Application
  • Public subnet: Load balancers and web servers
  • Private subnet: Application servers and databases
  • No direct internet access to the private subnet
Multi-Tier Network Segmentation
  • Web tier subnet: Public-facing web servers
  • Application tier subnet: Internal application logic
  • Database tier subnet: Data storage and processing
  • Management subnet: Administrative access

Monitoring & Optimization

Implement comprehensive monitoring and use data-driven approaches to optimize your instance performance and cost.

Advanced Metrics & Alerts

Go beyond basic monitoring to gain deeper insights into your instances' performance and health.

Key Metrics to Monitor

System Metrics
  • CPU steal time (virtualization overhead)
  • Memory swap rate
  • Disk I/O wait times
  • Network packet loss
  • System load averages (1, 5, 15 min)
Application Metrics
  • Request latency (p50, p95, p99)
  • Error rates by type
  • Queue depths and processing times
  • Cache hit/miss ratios
  • Active sessions/connections
Business Metrics
  • Cost per transaction
  • Resource utilization per user
  • Revenue-generating vs. support operations
  • Peak vs. off-peak efficiency
  • Seasonality patterns

Effective Alerting Strategies

Multi-threshold Alerting
Set different thresholds for warning, critical, and emergency conditions
Composite Alerts
Alert on combinations of metrics rather than single metrics in isolation
Time-of-Day Sensitivity
Adjust thresholds based on expected load patterns for different times
Anomaly Detection
Use machine learning to detect deviations from normal patterns

Available Tools

SLYD provides several tools for advanced monitoring and alerting:

SLYD Metrics: Built-in monitoring with basic alerting capabilities
Grafana Integration: Rich visualization and advanced alerting
Prometheus Support: Time-series data collection and querying
Benson AI Analysis: Intelligent monitoring with predictive insights

Infrastructure as Code

Manage your instances and infrastructure through code for consistency, reproducibility, and automation.

SLYD CLI & API

Use SLYD's command-line interface and RESTful API to script and automate infrastructure management.

API Example - Create Instance
curl -X POST https://api.slyd.cloud/v1/instances \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "name": "web-server-prod",
      "type": "standard-4",
      "region": "us-east",
      "os": "ubuntu-20.04",
      "storage": 100,
      "tags": {
        "environment": "production",
        "app": "web-server",
        "team": "backend"
      }
    }'
Terraform Provider

Use HashiCorp Terraform to define and manage your SLYD infrastructure in a declarative format.

Terraform Example
provider "slyd" {
  api_key = var.slyd_api_key
  region  = "us-east"
}

resource "slyd_instance" "web_server" {
  name        = "web-server-prod"
  type        = "standard-4"
  region      = "us-east"
  os          = "ubuntu-20.04"
  storage     = 100
  
  security_groups = [slyd_security_group.web.id]
  
  tags = {
    environment = "production"
    app         = "web-server"
    team        = "backend"
  }
}

Benefits of Infrastructure as Code

Version-controlled infrastructure changes
Consistent environment replication
Easy rollback of infrastructure changes
Automated provisioning and scaling
Self-documenting infrastructure
Compliance and audit support

Performance Tuning

Optimize your instances for specific workloads to improve performance, reduce costs, and enhance efficiency.

CPU Optimization
CPU Pinning
Assign specific workloads to dedicated CPU cores to reduce context switching
Process Priority
Use nice and ionice to prioritize critical processes
CPU Governor
Configure the CPU scaling governor based on workload needs (performance vs. powersave)
CPU Pinning Example
# Assign a process to specific CPU cores
taskset -c 0,1 your_application

# Permanently assign in systemd service
# In your_service.service:
# CPUAffinity=0 1
Memory Optimization
Swap Configuration
Adjust swappiness parameter based on workload memory access patterns
Huge Pages
Enable huge pages for database workloads to reduce TLB misses
Memory Limits
Use cgroups to limit memory usage for specific processes
Memory Tuning Example
# Adjust swappiness (0-100, lower = less swapping)
echo 10 > /proc/sys/vm/swappiness

# Enable huge pages
echo 1024 > /proc/sys/vm/nr_hugepages
Disk I/O Optimization
I/O Scheduler
Select appropriate I/O scheduler based on workload type
Noatime Mount Option
Reduce disk writes by disabling access time updates
RAID Configuration
Use RAID 0 for performance or RAID 10 for performance with redundancy
Disk I/O Tuning Example
# Check current I/O scheduler
cat /sys/block/sda/queue/scheduler

# Change I/O scheduler to deadline
echo deadline > /sys/block/sda/queue/scheduler

# Mount with noatime in /etc/fstab
# /dev/sda1 /data ext4 defaults,noatime 0 2
Performance Testing

Always benchmark your application before and after performance tuning to measure the impact of your changes. Use tools like:

  • stress-ng: For general system stress testing
  • sysbench: For CPU, memory, and I/O benchmarking
  • fio: For detailed storage performance testing
  • iperf3: For network throughput testing

Automation & Orchestration

Implement automation to streamline operations, reduce manual tasks, and ensure consistency.

Instance Orchestration

Manage groups of instances as a cohesive unit with coordinated operations.

Instance Groups

Logical collections of instances that can be managed together:

  • Perform bulk operations (start, stop, update)
  • Share configuration and scaling policies
  • Monitor as a unified entity
  • Apply consistent tags and metadata
Rolling Updates

Update instances in sequence to maintain availability:

  • Specify maximum unavailable percentage
  • Automatic health checking between updates
  • Rollback capability if issues are detected
  • Configurable pause between instance updates
CLI Rolling Update Example
slyd instance-group update \
    --group-id ig-web-cluster \
    --image-id img-ubuntu2204-v2 \
    --max-unavailable 25% \
    --health-check-path "/health" \
    --health-check-port 8080 \
    --pause-between-instances 60s \
    --auto-rollback-on-failure

Event-Driven Automation

Respond automatically to specific events, metrics, or conditions in your environment.

Event-Driven Automation Flow

Triggering Events

Instance lifecycle changes (create, start, stop, terminate)
Metric threshold crossings (CPU, memory, disk, network)
Error or warning conditions (application errors, system logs)
Scheduled events (cron-style time-based triggers)
Code repository events (commits, pull requests, releases)
User-initiated actions (API calls, dashboard operations)

Example Automation Workflows

Auto-Scaling Response
1

CPU utilization exceeds 80% for 5 minutes

2

Event triggers scaling policy

3

New instance is provisioned from template

4

Load balancer configuration updated

5

Notification sent to system managers

Deployment Automation
1

Code pushed to main branch

2

CI/CD pipeline builds and tests application

3

New instance image created with updated code

4

Rolling update initiated for production instances

5

Health checks confirm successful deployment

Advanced Best Practices

Follow these recommendations to maximize the efficiency, reliability, and security of your SLYD instances.

Security Hardening

Minimal Attack Surface

Install only necessary packages and disable unused services

Secure SSH Configuration

Disable root login, use key-based authentication, implement fail2ban

Regular Updates

Enable automatic security updates or establish a regular patching schedule

Host-Based Firewall

Configure iptables or ufw on each instance as an additional security layer

File Integrity Monitoring

Implement tools like AIDE or Tripwire to detect unauthorized file modifications

Performance Optimization

Right-Sizing

Regularly analyze utilization patterns and adjust instance sizes accordingly

Storage Optimization

Use different volume types based on workload I/O patterns

Instance Placement

Deploy instances in regions closest to your users to reduce latency

Load Testing

Regularly conduct load tests to identify performance bottlenecks

Caching Strategy

Implement appropriate caching at multiple levels (application, database, HTTP)

Reliability Engineering

Redundancy

Deploy critical workloads across multiple instances and availability zones

Failure Testing

Regularly conduct chaos engineering exercises to validate resilience

Self-Healing Systems

Implement health checks and automatic recovery for failed components

Circuit Breakers

Prevent cascading failures by implementing circuit breakers in distributed systems

Disaster Recovery

Maintain up-to-date DR plans and regularly test recovery procedures

An unhandled error has occurred. Reload 🗙