Instances Management Guide v1.0.0

Learn advanced techniques and best practices for managing your compute instances on the SLYD platform.

About This Guide

This guide expands on the Compute Instances documentation to provide deeper insights, advanced management techniques, and real-world usage scenarios. It's designed for users who are already familiar with the basics of SLYD compute instances.

Instance Lifecycle Management

Effectively managing the lifecycle of your compute instances helps optimize costs and ensure resources are available when needed.

Scheduling

Set up automatic start and stop schedules for predictable workloads to reduce costs during idle periods.

Schedule Creation via CLI

slyd instance schedule create \
    --instance-id inst-abc123 \
    --name "workday-hours" \
    --start "0 8 * * 1-5" \
    --stop "0 18 * * 1-5" \
    --timezone "America/New_York"

You can also create schedules through the web interface:

Navigate to your instance details page

Select the "Scheduling" tab

Click "Create Schedule" and configure your desired time parameters

Choose recurrence pattern (daily, weekdays only, custom)

Time-to-Live (TTL)

Set expiration times for temporary instances to ensure they don't run longer than needed.

Common Use Case:

Development and testing instances that are only needed for a specific duration, such as a two-week sprint or a three-day hackathon.

Setting TTL via CLI

slyd instance update \
    --instance-id inst-abc123 \
    --ttl "2022-12-31T23:59:59Z"

When the TTL is reached, the instance will automatically be stopped (not terminated). You'll receive a notification, allowing you to either extend the TTL or terminate the instance.

Auto-scaling

Configure your instances to automatically scale based on metrics like CPU usage, memory utilization, or custom application metrics.

Horizontal Scaling

Automatically add or remove identical instances as demand changes

Vertical Scaling

Automatically increase or decrease the resources (CPU, RAM) assigned to an instance

Example Configuration:

Scale Out: Add instance when CPU > 75% for 5 minutes
Scale In: Remove instance when CPU < 25% for 15 minutes
Min Instances: 2
Max Instances: 10

Auto-scaling is particularly effective for web applications, API servers, and batch processing workloads with variable demand.

Data Management

Proper data management ensures your information remains persistent, secure, and optimized.

Storage Volumes

Separate your data from your compute instances for better persistence, flexibility, and performance.

SSD Volumes

High-performance solid-state storage ideal for databases, applications requiring low latency, and random I/O operations.

Performance: Up to 10,000 IOPS
Size Range: 10GB - 4TB
Use Cases: Databases, high-traffic web applications

HDD Volumes

Cost-effective storage for large datasets and sequential access patterns.

Performance: Up to 500 IOPS
Size Range: 100GB - 16TB
Use Cases: Log storage, media files, backups

Create a Volume

From your dashboard, go to Storage > Volumes > Create Volume

Specify Volume Details

Choose type, size, and whether to encrypt the volume

Attach to Instance

Select which instance to attach the volume to and specify a mount point

Format and Mount (first-time only)

If it's a new volume, format it and configure your system to mount it automatically

Format and Mount Example

# Format the volume with ext4
sudo mkfs -t ext4 /dev/xvdf

# Create a mount point
sudo mkdir /data

# Mount the volume
sudo mount /dev/xvdf /data

# Add to fstab for auto-mount on reboot
echo '/dev/xvdf /data ext4 defaults,nofail 0 2' | sudo tee -a /etc/fstab

Backup Strategies

Implement a comprehensive backup strategy to protect your data from loss, corruption, or accidental deletion.

Snapshots

Point-in-time copies of your volumes that can be used to create new volumes or restore existing ones.

Incremental after first snapshot

Fast restoration

Can be scheduled

Backup Service

Full-featured backup service with compression, encryption, and long-term retention options.

Application-consistent backups

Differential and incremental options

Cross-region replication

Backup Best Practices

Use the 3-2-1 rule: 3 copies of data, 2 different media types, 1 off-site backup
Test your backup restoration process regularly
Automate backup verification to ensure backups are valid
Implement appropriate retention policies for different data types
Document your backup and recovery procedures

Instance Imaging

Create custom images of your instances to capture their configuration, installed software, and data for reuse and replication.

Prepare Your Instance

Clean up temporary files, remove sensitive information, and ensure the instance is in a consistent state

Create the Image

From the instance dashboard, select "Create Image" and provide image details

Wait for Completion

The imaging process can take several minutes depending on the instance size

Launch New Instances

Use your custom image to launch new instances with identical configurations

Common Use Cases

Environment replication for development, testing, and production

Standardized images for team onboarding

Template instances for horizontal scaling

Golden images for disaster recovery

Advanced Networking

Configure sophisticated networking capabilities to enhance security, performance, and connectivity.

Security Groups

Virtual firewalls that control inbound and outbound traffic to your instances, providing an additional layer of security.

Example Security Group: Web Server

Type	Protocol	Port Range	Source/Destination	Description
Inbound	TCP	80, 443	0.0.0.0/0	HTTP/HTTPS traffic
Inbound	TCP	22	192.168.1.0/24	SSH from internal network
Outbound	All	All	0.0.0.0/0	Allow all outbound traffic

Create a Security Group

Navigate to Networking > Security Groups > Create Security Group

Define Rules

Add inbound and outbound rules with protocols, ports, and source/destination

Assign to Instances

Attach the security group to existing instances or apply during instance creation

Security Best Practices

Follow the principle of least privilege when creating rules
Use specific IP ranges rather than 0.0.0.0/0 when possible
Create purpose-specific security groups (e.g., web, database) and apply them as needed
Regularly audit and update security group rules

Virtual Private Networks (VPNs)

Connect your on-premises network to your SLYD environment securely over the internet.

Site-to-Site VPN

Connect your entire corporate network to your SLYD environment, allowing seamless access between resources in both networks.

Compatible VPN device or software at your location

Static public IP address

BGP routing (for dynamic routing)

Client VPN

Allow individual users to connect securely to your SLYD environment from anywhere.

OpenVPN-compatible client software

Authentication credentials

Client configuration file

Setting Up a VPN Connection

Navigate to Networking > VPN Connections > Create VPN Connection

Select VPN type and configure connection parameters

Download the configuration file for your local VPN device or software

Apply the configuration and establish the connection

Private Networking

Create isolated network environments for your instances to enhance security and control communication.

Key Components

Virtual Private Cloud (VPC)

Isolated network environment within SLYD

Subnets

Network segments within a VPC with their own IP address ranges

Route Tables

Define how network traffic is directed within and outside the VPC

Network ACLs

Stateless network-level firewall controls for subnets

Common Network Architectures

Public-Private Web Application

Public subnet: Load balancers and web servers
Private subnet: Application servers and databases
No direct internet access to the private subnet

Multi-Tier Network Segmentation

Web tier subnet: Public-facing web servers
Application tier subnet: Internal application logic
Database tier subnet: Data storage and processing
Management subnet: Administrative access

Monitoring & Optimization

Implement comprehensive monitoring and use data-driven approaches to optimize your instance performance and cost.

Advanced Metrics & Alerts

Go beyond basic monitoring to gain deeper insights into your instances' performance and health.

Key Metrics to Monitor

System Metrics

CPU steal time (virtualization overhead)
Memory swap rate
Disk I/O wait times
Network packet loss
System load averages (1, 5, 15 min)

Application Metrics

Request latency (p50, p95, p99)
Error rates by type
Queue depths and processing times
Cache hit/miss ratios
Active sessions/connections

Business Metrics

Cost per transaction
Resource utilization per user
Revenue-generating vs. support operations
Peak vs. off-peak efficiency
Seasonality patterns

Effective Alerting Strategies

Multi-threshold Alerting

Set different thresholds for warning, critical, and emergency conditions

Composite Alerts

Alert on combinations of metrics rather than single metrics in isolation

Time-of-Day Sensitivity

Adjust thresholds based on expected load patterns for different times

Anomaly Detection

Use machine learning to detect deviations from normal patterns

Available Tools

SLYD provides several tools for advanced monitoring and alerting:

SLYD Metrics: Built-in monitoring with basic alerting capabilities

Grafana Integration: Rich visualization and advanced alerting

Prometheus Support: Time-series data collection and querying

Benson AI Analysis: Intelligent monitoring with predictive insights

Infrastructure as Code

Manage your instances and infrastructure through code for consistency, reproducibility, and automation.

SLYD CLI & API

Use SLYD's command-line interface and RESTful API to script and automate infrastructure management.

API Example - Create Instance

curl -X POST https://api.slyd.cloud/v1/instances \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "name": "web-server-prod",
      "type": "standard-4",
      "region": "us-east",
      "os": "ubuntu-20.04",
      "storage": 100,
      "tags": {
        "environment": "production",
        "app": "web-server",
        "team": "backend"
      }
    }'

Terraform Provider

Use HashiCorp Terraform to define and manage your SLYD infrastructure in a declarative format.

Terraform Example

provider "slyd" {
  api_key = var.slyd_api_key
  region  = "us-east"
}

resource "slyd_instance" "web_server" {
  name        = "web-server-prod"
  type        = "standard-4"
  region      = "us-east"
  os          = "ubuntu-20.04"
  storage     = 100
  
  security_groups = [slyd_security_group.web.id]
  
  tags = {
    environment = "production"
    app         = "web-server"
    team        = "backend"
  }
}

Benefits of Infrastructure as Code

Version-controlled infrastructure changes

Consistent environment replication

Easy rollback of infrastructure changes

Automated provisioning and scaling

Self-documenting infrastructure

Compliance and audit support

Performance Tuning

Optimize your instances for specific workloads to improve performance, reduce costs, and enhance efficiency.

CPU Optimization

CPU Pinning

Assign specific workloads to dedicated CPU cores to reduce context switching

Process Priority

Use nice and ionice to prioritize critical processes

CPU Governor

Configure the CPU scaling governor based on workload needs (performance vs. powersave)

CPU Pinning Example

# Assign a process to specific CPU cores
taskset -c 0,1 your_application

# Permanently assign in systemd service
# In your_service.service:
# CPUAffinity=0 1

Memory Optimization

Swap Configuration

Adjust swappiness parameter based on workload memory access patterns

Huge Pages

Enable huge pages for database workloads to reduce TLB misses

Memory Limits

Use cgroups to limit memory usage for specific processes

Memory Tuning Example

# Adjust swappiness (0-100, lower = less swapping)
echo 10 > /proc/sys/vm/swappiness

# Enable huge pages
echo 1024 > /proc/sys/vm/nr_hugepages

Disk I/O Optimization

I/O Scheduler

Select appropriate I/O scheduler based on workload type

Noatime Mount Option

Reduce disk writes by disabling access time updates

RAID Configuration

Use RAID 0 for performance or RAID 10 for performance with redundancy

Disk I/O Tuning Example

# Check current I/O scheduler
cat /sys/block/sda/queue/scheduler

# Change I/O scheduler to deadline
echo deadline > /sys/block/sda/queue/scheduler

# Mount with noatime in /etc/fstab
# /dev/sda1 /data ext4 defaults,noatime 0 2

Performance Testing

Always benchmark your application before and after performance tuning to measure the impact of your changes. Use tools like:

stress-ng: For general system stress testing
sysbench: For CPU, memory, and I/O benchmarking
fio: For detailed storage performance testing
iperf3: For network throughput testing

Automation & Orchestration

Implement automation to streamline operations, reduce manual tasks, and ensure consistency.

Instance Orchestration

Manage groups of instances as a cohesive unit with coordinated operations.

Instance Groups

Logical collections of instances that can be managed together:

Perform bulk operations (start, stop, update)
Share configuration and scaling policies
Monitor as a unified entity
Apply consistent tags and metadata

Rolling Updates

Update instances in sequence to maintain availability:

Specify maximum unavailable percentage
Automatic health checking between updates
Rollback capability if issues are detected
Configurable pause between instance updates

CLI Rolling Update Example

slyd instance-group update \
    --group-id ig-web-cluster \
    --image-id img-ubuntu2204-v2 \
    --max-unavailable 25% \
    --health-check-path "/health" \
    --health-check-port 8080 \
    --pause-between-instances 60s \
    --auto-rollback-on-failure

Event-Driven Automation

Respond automatically to specific events, metrics, or conditions in your environment.

Triggering Events

Instance lifecycle changes (create, start, stop, terminate)

Metric threshold crossings (CPU, memory, disk, network)

Error or warning conditions (application errors, system logs)

Scheduled events (cron-style time-based triggers)

Code repository events (commits, pull requests, releases)

User-initiated actions (API calls, dashboard operations)

Example Automation Workflows

Auto-Scaling Response

CPU utilization exceeds 80% for 5 minutes

Event triggers scaling policy

New instance is provisioned from template

Load balancer configuration updated

Notification sent to system managers

Deployment Automation

Code pushed to main branch

CI/CD pipeline builds and tests application

New instance image created with updated code

Rolling update initiated for production instances

Health checks confirm successful deployment

Advanced Best Practices

Follow these recommendations to maximize the efficiency, reliability, and security of your SLYD instances.

Security Hardening

Minimal Attack Surface

Install only necessary packages and disable unused services

Secure SSH Configuration

Disable root login, use key-based authentication, implement fail2ban

Regular Updates

Enable automatic security updates or establish a regular patching schedule

Host-Based Firewall

Configure iptables or ufw on each instance as an additional security layer

File Integrity Monitoring

Implement tools like AIDE or Tripwire to detect unauthorized file modifications

Performance Optimization

Right-Sizing

Regularly analyze utilization patterns and adjust instance sizes accordingly

Storage Optimization

Use different volume types based on workload I/O patterns

Instance Placement

Deploy instances in regions closest to your users to reduce latency

Load Testing

Regularly conduct load tests to identify performance bottlenecks

Caching Strategy

Implement appropriate caching at multiple levels (application, database, HTTP)

Reliability Engineering

Redundancy

Deploy critical workloads across multiple instances and availability zones

Failure Testing

Regularly conduct chaos engineering exercises to validate resilience

Self-Healing Systems

Implement health checks and automatic recovery for failed components

Circuit Breakers

Prevent cascading failures by implementing circuit breakers in distributed systems

Disaster Recovery

Maintain up-to-date DR plans and regularly test recovery procedures

Was this guide helpful?

We're constantly improving our documentation. Let us know how we can make it better!