Providing Out-of-Band Connectivity to Mission-Critical IT Resources

Improving Your Zero Trust Security Posture

Zero Trust for the Edge(1)

The current cyber threat landscape is daunting, with attacks occurring so frequently that security experts recommend operating under the assumption that your network is already breached. Major cyber attacks – and the disruptions they cause – frequently make news headlines. The MGM hack, LendingTree breach, and CDK Global attack are just a few examples that affected thousands of people per incident and now have many organizations rethinking their resilience strategies.

The zero trust security methodology outlines the best practices for limiting the blast radius of a successful breach by preventing malicious actors from moving laterally through the network and accessing the most valuable or sensitive resources. Many organizations have already begun their zero trust journey by implementing role-based access controls (RBAC), multi-factor authentication (MFA), and other security solutions, but still struggle with coverage gaps that result in ransomware attacks and other disruptive breaches. This blog provides advice for improving your zero trust security posture with a multi-layered strategy that mitigates weaknesses for complete coverage.

How to improve your zero trust security posture

.

Strategy

Description

Gain a full understanding of your protect surface

Use automated discovery tools to identify all the data, assets, applications, and services that an attacker could potentially target.

Micro-segment your network with micro-perimeters

Implement specific policies, controls, and trust verification mechanisms to mitigate and protect surface vulnerabilities.

Isolate and defend your management infrastructure

Use OOB management and hardware security to prevent attackers from compromising the control plane.

Defend your cloud resources

Understand the shared responsibility model and use cloud-specific tools like a CASB to prevent shadow IT and enforce zero trust.

Extend zero trust to the edge

Use edge-centric solutions like SASE to extend zero trust policies and controls to remote network traffic, devices, and users.

Gain a full understanding of your protect surface

Many security strategies focus on defending the network’s “attack surface,” or all the potential vulnerabilities an attacker could exploit to breach the network. However, zero trust is all about defending the “protect surface,” or all the data, assets, applications, and services that an attacker could potentially try to access. The key difference is that zero trust doesn’t ask you to try to cover any possible weakness in a network, which is essentially impossible. Instead, it wants you to look at the resources themselves to determine what has the most value to an attacker, and then implement security controls that are tailored accordingly.

Gaining a full understanding of all the resources on your network can be extraordinarily challenging, especially with the proliferation of SaaS apps, mobile devices, and remote workforces. There are automated tools that can help IT teams discover all the data, apps, and devices on the network. Application discovery and dependency mapping (ADDM) tools help identify all on-premises software and third-party dependencies; cloud application discovery tools do the same for cloud-hosted apps by monitoring network traffic to cloud domains. Sensitive data discovery tools scan all known on-premises or cloud-based resources for personally identifiable information (PII) and other confidential data, and there are various device management solutions to detect network-connected hardware, including IoT devices.
,

  • Tip: This step can’t be completed one time and then forgotten – teams should execute discovery processes on a regular, scheduled basis to limit gaps in protection. 

Micro-segment your network with micro-perimeters

Micro-segmentation is a cornerstone of zero-trust networks. It involves logically separating all the data, applications, assets, and services according to attack value, access needs, and interdependencies. Then, teams implement granular security policies and controls tailored to the needs of each segment, establishing what are known as micro-perimeters. Rather than trying to account for every potential vulnerability with one large security perimeter, teams can just focus on the tools and policies needed to cover the specific vulnerabilities of a particular micro-segment.

Network micro-perimeters help improve your zero trust security posture with:

  • Granular access policies granting the least amount of privileges needed for any given workflow. Limiting the number of accounts with access to any given resource, and limiting the number of privileges granted to any given account, significantly reduces the amount of damage a compromised account (or malicious actor) is capable of inflicting.
  • Targeted security controls addressing the specific risks and vulnerabilities of the resources in a micro-segment. For example, financial systems need stronger encryption, strict data governance monitoring, and multiple methods of trust verification, whereas an IoT lighting system requires simple monitoring and patch management, so the security controls for these micro-segments should be different.
  • Trust verification using context-aware policies to catch accounts exhibiting suspicious behavior and prevent them from accessing sensitive resources. If a malicious outsider compromises an authorized user account and MFA device – or a disgruntled employee uses their network privileges to harm the company – it can be nearly impossible to prevent data exposure. Context-aware policies can stop a user from accessing confidential resources outside of typical operating hours, or from unfamiliar IP addresses, for example. Additionally, user entity and behavior analytics (UEBA) solutions use machine learning to detect other abnormal and risky behaviors that could indicate malicious intent.

Isolate and defend your management infrastructure

For zero trust to be effective, organizations must apply consistently strict security policies and controls to every component of their network architecture, including the management interfaces used to control infrastructure. Otherwise, a malicious actor could use a compromised sysadmin account to hijack the control plane and bring down the entire network.

According to a recent CISA directive, the best practice is to isolate the network’s control plane so that management interfaces are inaccessible from the production network. Many new cybersecurity regulations, including PCI DSS 4.0, DORA, NIS2, and the CER Directive, also either strongly recommend or require management infrastructure isolation.

Isolated management infrastructure (IMI) prevents compromised accounts, ransomware, and other threats from moving laterally to or from the production LAN. It gives teams a safe environment to recover from ransomware or other cyberattacks without risking reinfection, which is known as an isolated recovery environment (IRE). Management interfaces and the IRE should also be protected by granular, role-based access policies, multi-factor authentication, and strong hardware roots of trust to further mitigate risk.

A diagram showing how to use Nodegrid Gen 3 OOB to enable IMI.The easiest and most secure way to implement IMI is with Gen 3 out-of-band (OOB) serial console servers, like the Nodegrid solution from ZPE Systems. These devices use alternative network interfaces like 5G/4G LTE cellular to ensure complete isolation and 24/7 management access even during outages. They’re protected by hardware security features like TPM 2.0 and GPS geofencing, and they integrate with zero trust solutions like identity and access management (IAM) and UEBA to enable consistent policy enforcement.

Defend your cloud resources

The vast majority of companies host some or all of their workflows in the cloud, which significantly expands and complicates the attack surface while making it more challenging to identify and defend the protect surface. Some organizations also lack a complete understanding of the shared responsibility model for varying cloud services, increasing the chances of coverage gaps. Additionally, many orgs struggle with “shadow IT,” which occurs when individual business units implement cloud applications without going through onboarding, preventing security teams from applying zero trust controls.

The first step toward improving your zero trust security posture in the cloud is to ensure you understand where your cloud service provider’s responsibilities end and yours begin. For instance, most SaaS providers handle all aspects of security except IAM and data protection, whereas IaaS (Infrastructure-as-a-Service) providers are only responsible for protecting their physical and virtual infrastructure.

It’s also vital that security teams have a complete picture of all the cloud services in use by the organization and a way to deploy and enforce zero trust policies in the cloud. For example, a cloud access security broker (CASB) is a solution that discovers all the cloud services in use by an organization and allows teams to monitor and manage security for the entire cloud architecture. A CASB provides capabilities like data governance, malware detection, and adaptive access controls, so organizations can protect their cloud resources with the same techniques used in the on-premises environment.
.

Example Cloud Access Security Broker Capabilities

Visibility

Compliance

Threat protection

Data security

Cloud service discovery

Monitoring and reporting

User authentication and authorization

Data governance and loss prevention

Malware (e.g., virus, ransomware) detection

User and entity behavior analytics (UEBA)

Data encryption and  tokenization

Data leak prevention

Extend zero trust to the edge

Modern enterprise networks are highly decentralized, with many business operations taking place at remote branches, Internet of Things (IoT) deployment sites, and end-users’ homes. Extending security controls to the edge with on-premises zero trust solutions is very difficult without backhauling all remote traffic through a centralized firewall, which creates bottlenecks that affect performance and reliability. Luckily, the market for edge security solutions is rapidly growing and evolving to help organizations overcome these challenges. 

Security Access Service Edge (SASE) is a type of security platform that delivers core capabilities as a managed, typically cloud-based service for the edge. SASE uses software-defined wide area networking (SD-WAN) to intelligently and securely route edge traffic through the SASE tech stack, allowing the application and enforcement of zero trust controls. In addition to CASB and next-generation firewall (NGFW) features, SASE usually includes zero trust network access (ZTNA), which offers VPN-like functionality to connect remote users to enterprise resources from outside the network. ZTNA is more secure than a VPN because it only grants access to one app at a time, requiring separate authorization requests and trust verification attempts to move to different resources. 

Accelerating the zero trust journey

Zero trust is not a single security solution that you can implement once and forget about – it requires constant analysis of your security posture to identify and defend weaknesses as they arise. The best way to ensure adaptability is by using vendor-agnostic platforms to host and orchestrate zero trust security. This will allow you to add and change security services as needed without worrying about interoperability issues.

For example, the Nodegrid platform from ZPE Systems includes vendor-neutral serial consoles and integrated branch services routers that can host third-party software such as SASE and NGFWs. These devices also provide Gen 3 out-of-band management for infrastructure isolation and network resilience. Nodegrid protects management interfaces with strong hardware roots-of-trust, embedded firewalls, SAML 2.0 integrations, and other zero trust security features. Plus, with Nodegrid’s cloud-based or on-premises management platform, teams can orchestrate networking, infrastructure, and security workflows across the entire enterprise architecture.

 

Improve your zero trust security posture with Nodegrid

Using Nodegrid as the foundation for your zero trust network infrastructure ensures maximum agility while reducing management complexity. Watch a Nodegrid demo to learn more.

Schedule a Demo

DORA Act: 5 Takeaways For The Financial Sector

Thumbnail – DORA Act 5 Takeaways for the Financial Sector

The Digital Operational Resilience Act (DORA) is a regulatory initiative within the European Union that aims to enhance the operational resilience of the financial sector. Its main goal is to prevent and mitigate cyber threats and operational disruptions. The DORA Act outlines regulatory requirements for the security of network and information systems “whereby all firms need to make sure they can withstand, respond to and recover from all types of ICT-related disruptions and threats” (DORA Act website).

Who and What Are Covered Under the DORA Act?

The DORA Act is a regulation that covers all financial entities within the European Union (EU). It recognizes the critical role of information and communication technology (ICT) systems in financial services. DORA applies to financial services including payments, securities, credit rating, algorithmic trading, lending, insurance, and back-office operations. It establishes a framework for ICT risk management through technical standards, which are being released in two phases, the first of which was published on January 17, 2024. The DORA Act will go into effect in its entirety on January 17, 2025.

With cyberattacks constantly in the news cycle, it’s no surprise that governing bodies are putting forth standards for operational resilience. But without combing through this lengthy piece of legislation, what should IT teams start thinking about from a practical standpoint? Here are 5 takeaways on what the DORA Act means for the financial sector.

DORA Act: 5 Takeaways for the Financial Sector

1. Shore-up your cybersecurity measures

The DORA Act emphasizes strengthening cybersecurity measures within the financial sector. It requires financial institutions, such as banks, stock exchanges, and financial infrastructure providers, to implement robust cybersecurity controls and protocols. These include adopting advanced authentication mechanisms, encryption standards, and network segmentation to protect sensitive financial data and critical infrastructure from cyber threats. Part of this will also require organizations to apply system patches and updates in a timely manner, which means automated patching will become necessary to every organization’s security posture.

2. Implement resilience systems

Operational resilience is a key focus area of the DORA Act, aiming to ensure the continuity of essential financial services in the face of cyber threats, natural disasters, and other operational disruptions. Financial institutions are required to develop comprehensive business continuity plans, establish redundant systems and backup facilities, and conduct regular stress tests to assess their ability to withstand and recover from various scenarios. Implementing a resilience system helps with this, as it provides all the infrastructure, tools, and services necessary to continue operating during major incidents.

3. Conduct regular scans for vulnerabilities

The DORA Act mandates financial institutions to implement robust risk management practices to identify, assess, and mitigate cyber risks and operational vulnerabilities. This includes conducting regular assessments, vulnerability scans, and penetration tests, and developing incident response procedures to quickly address threats. This is all part of taking a proactive approach to identify and mitigate cyber incidents, and reduce the impact that adverse events have on financial stability and consumer confidence.

4. Collaborate and share information with industry peers

The DORA Act encourages financial institutions to share cybersecurity threat intelligence, incident data, and best practices with industry peers, regulators, and law enforcement agencies. The ability to monitor systems and collect data will be crucial to this approach, and will require systems that can rapidly (and securely) deploy apps/services during ongoing incidents. This will help financial institutions to better understand emerging threats, coordinate responses to cyber incidents, and strengthen collective defenses against threats and operational disruptions.

5. Segment physical and logical systems to pass regular audits

Through the DORA Act, regulators are empowered to conduct regular assessments, audits, and inspections of systems. This will ensure that financial institutions are implementing adequate controls and safeguards to protect against cyber threats and operational disruptions. A crucial part to this will involve physical and logical separation of systems, such as through Isolated Management Infrastructure, as well as implementing zero trust architecture across the organization. These will help bolster resilience by eliminating control dependencies between management and production networks, which will also help to streamline audits.

Get the blueprint to help you comply with the DORA Act

DORA’s requirements are meant to help IT teams better protect sensitive data and the integrity of financial systems as a whole. But without a proper network management infrastructure, their production networks are too sensitive to errors and vulnerable to attacks. ZPE has created the blueprint that covers these 5 crucial takeaways outlined in the DORA Act. The architecture outlined in this blueprint has been trusted by Big Tech for more than a decade, as it allows them to deploy modern cybersecurity measures, physically and logically separated systems, and rapid recovery processes. Download the blueprint now.

What to do if You’re Ransomware’d: A Healthcare Example

What to do if youre ransomwared

This article was written by James Cabe, CISSP, a 30-year cybersecurity expert who’s helped major companies including Microsoft and Fortinet.

Ransomware gangs target the innocent and vulnerable. They hit a Chicago hospital in December 2023, a London hospital in October the same year, and schools and hospitals in New Jersey as recently as January 2024. This is one of the biggest reasons I’m committed to stopping these criminals by educating organizations on how to re-think and re-architect their approach to cybersecurity.

In previous articles, I discussed IMI (Isolated Management Infrastructure) and IRE (Isolated Recovery Environments), and how they could have quickly altered outcomes for MGM, Ragnar Locker victims, and organizations affected by the MOVEit vulnerability. Using IMI and IRE, organizations find that the key to not only speedy recovery, but also to limiting the blast radius and attack persistence, is isolation.

Why is isolation (not segmentation) key to ransomware recovery?

The NIST framework for incident response has five steps: Identify, Protect, Detect, Respond, and Recover. It’s missing a crucial step, however: Isolate. Stay tuned for a full breakdown of this in my next article. But the reason this is so critical is because attacks move at machine speed, and are very pervasive and persistent. If your management network is not fully isolated from production assets, the infection spreads to everything. Suddenly, you’re locked out completely and looking at months of tedious recovery. For healthcare providers, this jeopardizes everything from patient care to regulatory compliance.

Isolation is integral to building a resilience system, or in other words, a system that gives you more than basic serial console/out-of-band access and instead provides an entire infrastructure dedicated to keeping you in control of your systems — be it during a ransomware attack, ISP outage, natural disaster, etc. Because this infrastructure is physically and virtually isolated from production (no dependencies on production switches/routers, no open management ports, etc.), it’s nearly impossible for attackers to lock you out.

So, what really should you do if you’re ransomware’d? Let’s walk through an example attack on a healthcare system, and compare the traditional DR (Disaster Recovery) response to the IMI/IRE approach.

Ransomware in Healthcare: Disaster Recovery vs Isolated Recovery

Suppose you’re in charge of a hospital’s network. MDIoT, patient databases, and DICOM storage are the crown jewels of your infrastructure. Suddenly, you discover ransomware has encrypted patient records and is likely spreading quickly to other crown jewel assets. The risks and potential fallout can’t be understated. Millions of people are depending on you to protect their sensitive info, while the hospital is depending on you to help them avoid regulatory/legal penalties and ensure they can continue operating.

The problem with Disaster Recovery

Though the word ‘recovery’ is in the name, the DR approach is limited in its capacity to recover systems during an attack. Disaster Recovery typically employs a couple things:

  • Backups, which are copies of data, configurations, and code that are used to restore a production system when it fails.
  • Redundancy, which involves duplicating critical systems, services, and applications as a failsafe in the event that primaries go down (think cellular failover devices, secondary firewalls, etc.).

What happens when you activate your DR processes? It’s highly likely that you won’t be able to, and that’s because the typical DR setup relies on the production network. There’s no isolation.

Think about it this way: your backup servers need direct access to the data they’re backing up. If your file servers get pwned, your backup servers will, too. If your primary firewall gets hacked, your secondary will, too. The problem with backup and redundancy systems — and any system, for that matter — is that when they depend on the underlying infrastructure to remain operational, they’re just as susceptible to outages and attacks. It’s like having a reserve parachute that depends on the main parachute.

And what about the rest of your systems? You just discovered the attack has encrypted your servers and is quickly bringing operations to a crawl. How are you going to get in and fight back? What if you try to log into your management network, only to find that you’re locked out? All of your tools, configurations, and capabilities have been compromised.

This is why CISA, the FBI, US Navy, and other agencies recommend implementing Isolated Management Infrastructure.

IMI and IRE guarantee you can fight back against ransomware

You discover that the ransomware has spread. Not only has it encrypted data and stopped operations, but it has also locked you out of your own management network and is affecting the software configurations throughout the hospital. This is where IMI (Isolated Management Infrastructure) and IRE (Isolated Recovery Environment) come in.

Because IMI is physically separate from affected systems, it guarantees management access so teams can set up communication and a temporary ‘war room’ for incident response. The IRE can then be created using a combination of cellular, compute, connectivity, and power control (see diagram for design and steps). Docker containers should be used to bring up each step.

Diagram showing a chart containing the systems and open-source tools that can be deployed for an Isolated Recovery Environment

Image: The infrastructure and incident response protocol involved in the Isolated Recovery Environment. These products were chosen from free or open source projects that have proven to be very useful in each of these stages of recovery. These can be automated in pieces for each phase, and then be brought down via Docker container to eliminate the risk of leakage or risk during each phase.

Without diving too far into the technicalities, the IRE enables you to recover survivable data, restore software configurations, and prevent reinfection. Here are some things you can do (and should do) in this scenario, courtesy of the IRE:

Establish your war room

You can’t fight ransomware if you can’t securely communicate with your team. Use the IRE to create offline, break-the-glass accounts that are not attached to email. This allows you to communicate and set up ticketing for forensics purposes.

Isolate affected systems

There’s no use running antivirus if reinfection can occur. Use the IRE to take offline the switch that connects the backup and file servers. Isolate these servers from each other and shut down direct backup ports. Then, you can remote-in (KVM, iKVM, iDRAC) to run antivirus and EDR (Endpoint Detection and Response).

Restore data and device images

The key is to have backup data at its most current, both for patient data and device/software configurations. Because the IRE provides an isolated environment, and you’ve already pulled your backups offline, you can gradually restore data, re-image devices, and restore configurations without risking reinfection. The IRE ensures devices “keep away” from each other until they can be cleansed and recovered.

Things You’ll Need To Build The IMI and IRE

Network Automation Blueprint

We’ve created a comprehensive blueprint that shows how to implement the architecture for IMI and IRE. Don’t let the name fool you. The Network Automation Blueprint covers everything from establishing a dedicated management network, to automating deployment of services for ransomware recovery. Get your PDF copy now at the link below.

Gen 3 Console Servers To Replace End-of-Life Gear

It’s nearly impossible to build the IMI or deploy the IRE using older console servers. That’s because these only give you basic remote access and a hint of automation capabilities. You’ll still need the ability to run VMs and containers. Gen 3 console servers let you do all of the things for IMI and IRE, like full control plane/data plane separation, hosting apps, and deploying VMs/containers on-demand. They’ve also been validated by Synopsys and have built-in security features I’ve been talking about for years. Check out the link below for resources about Gen 3 and how we’ll help you upgrade.

Get in touch with me!

I’d love to talk with you about IMI, IRE, and resilience systems. These are becoming more crucial to operational resilience and ransomware recovery, and countries are passing new regulations that will require these approaches. Get in touch with me via social media to talk about this!

Best Network Performance Monitoring Tools

Best Network Performance Monitoring Tools
Network performance monitoring tools provide visibility into the health and efficiency of networks and their underlying infrastructure of devices and software. Some platforms focus entirely on collecting and analyzing logs from various sources on the network, while others provide additional management capabilities that let you control, change, and troubleshoot network infrastructure. Choosing the right solution requires a thoughtful consideration of factors such as the cost, scalability, and interoperability of the software, as well as your team’s experience and abilities. This guide compares three of the best network performance monitoring tools by analyzing these critical factors before providing advice on the most scalable and cost-effective way to deploy your solutions.

Comparing best network performance monitoring tools

Platform

Key Features

SolarWinds Network Performance Monitor (NPM)

  • Network device, performance, and fault monitoring

  • Deep packet inspection and analysis

  • LAN and WAN monitoring

  • Automatic network discovery, mapping, and monitoring

  • Network availability monitoring

  • Network diagnostics

  • Network path analysis

  • Network performance testing

  • SNMP monitoring

  • Wi-Fi analysis

Kentik

  • Network telemetry dashboards

  • Multi-vendor network monitoring

  • Cloud, edge, and hybrid cloud monitoring

  • SaaS application performance & uptime monitoring

  • Intelligent automated alerts

  • SNMP, traffic flow, VPC, host agent, and synthetic monitoring

  • Multi-cloud performance monitoring

  • Kubernetes workload monitoring

  • SD-WAN monitoring

  • Network security monitoring

  • Network map visualizations

  • QoE monitoring

ThousandEyes

  • Network availability and performance testing

  • WAN performance monitoring

  • Cisco SD-WAN monitoring and optimization

  • Browser session monitoring

  • Network path visibility

  • User Wi-Fi connectivity monitoring

  • VPN mapping and monitoring

  • Cross-layer data visualizations

Disclaimer: This comparison was written by a 3rd party in collaboration with ZPE Systems using data gathered from publicly available data sheets and admin guides, as of 10/20/2023. Please email us if you have corrections or edits, or want to review additional attributes: Matrix@zpesystems.com

SolarWinds Network Performance Monitor (NPM)

The Network Performance Monitor (NPM) is part of the SolarWinds Orion platform of integrated products. This mature and richly featured monitoring software is delivered as a cloud-based service and can observe SaaS (software as a service), cloud, hybrid cloud, and on-premises infrastructure. With advanced features like deep packet inspection (DPI), WAN optimization monitoring, automatic network mapping, and automated diagnostic tools, SolarWinds NPM is meant to be a complete, enterprise-grade observability solution. As part of the Orion platform, it’s also extensible with other products from the SolarWinds ecosystem, such as a Network Configuration Manager. As an enterprise solution, SolarWinds NPM comes with a high price tag that grows even larger as additional monitoring agents are added, limiting the scalability. Another important factor to consider is that SolarWinds recently suffered a high-profile hack that compromised thousands of customers, so there are security risks involved in trusting the Orion supply chain. Additionally, despite a large library of integrations, SolarWinds is a closed ecosystem that doesn’t work well with 3rd-party tools or custom scripts.​

Pros

Cons

  • Supports SaaS, cloud, and on-premises networks
  • Includes advanced monitoring features like DPI
  • Part of a large ecosystem of observability and management solutions
  • Pricing is expensive and limits scalability
  • Recently suffered a high-profile breach that impacted thousands of customers
  • Closed ecosystem may not support your 3rd-party tools

Kentik

Kentik is an end-to-end network observability platform for cloud, multi-cloud, hybrid cloud, SaaS, and data center infrastructure. In addition to network performance monitoring, the platform includes monitoring solutions for SaaS application performance and SD-WAN performance. Other observability features include SaaS uptime monitoring, AI-driven insights and alerts, network security monitoring, and QoE (Quality of Experience) monitoring. Kentik also recently launched a Kubernetes network monitoring solution called Kentik Kube that provides end-to-end cluster visibility. Overall, Kentik is a powerful network observability platform that includes many of its most innovative features in its “Essentials” and “Pro” pricing packages, providing a lot of bang for your buck. The downside is that you can’t subscribe to features individually and must purchase a whole package, meaning you could end up paying for features you don’t need. Because Kentik is not a large vendor, its customer service may be slow to respond in some cases. Additionally, although Kentik does have a large library of integrations, it is not a vendor-neutral platform.

Pros

Cons

  • Supports cloud, multi-cloud, hybrid cloud, SaaS, and data center infrastructure
  • Includes many advanced features and solutions at no additional cost
  • Provides AI-driven network insights and intelligent alerts
  • Products aren’t available a la carte
  • Customer service and technical support can be slow to respond
  • Isn’t entirely vendor-neutral

ThousandEyes

ThousandEyes is a digital experience monitoring platform primarily focused on network and application synthetic testing, end-user performance monitoring, and ISP Internet monitoring for SaaS, cloud, and on-premises networks. Additionally, ThousandEyes is part of the Cisco family and can be used to monitor and optimize Cisco SD-WAN architectures. Across its family of observability products, ThousandEyes includes features like wireless network visibility, SaaS performance visualizations, cloud application outage detection, and SD-WAN performance forecasting. The major advantage of the ThousandEyes platform is that it provides true end-to-end visibility of the entire service delivery chain, including end-user device performance and third-party provider availability. One downside is the endpoint agent-based monitoring solution requires on-premises VMs to run, which can be cumbersome to maintain and limits scalability. The pricing is expensive compared to similar solutions, and you may have to combine products to get all the features you need. Additionally, ThousandEyes is not a vendor-neutral platform and has a relatively small library of integrations.

Pros

Cons

  • Supports SaaS, cloud, and on-premises networks
  • Works with Cisco DNA software for SD-WAN monitoring
  • Provides end-to-end visibility of the entire service delivery chain
  • Agent-based monitoring requires on-premises VMs, limiting scalability
  • Pricing is expensive compared to similar solutions
  • Limited integrations, preventing interoperability

Conclusion

Each of the solutions on this list has advantages that make it well-suited to certain environments, as well as limitations to consider. Solarwinds NPM is part of a large ecosystem of observability and management solutions that includes advanced features like DPI, but it’s suffering from a major security incident and has a closed ecosystem. Kentik packs a lot of innovative, AI-driven monitoring capabilities into its platform offerings, but its pricing tiers are inflexible, and it doesn’t have the large, enterprise-grade support team of its larger competitors. ThousandEyes provides end-to-end visibility of the entire service delivery chain and works seamlessly with Cisco DNA software, but it has a steep learning curve and a limited library of integrations.

How to run the best network performance monitoring tools

Most network performance monitoring tools – even cloud-based SaaS offerings – communicate with endpoint agents using software deployed on VMs (virtual machines) running on-premises in each business location. Running these VMs on fully provisioned servers or PCs is expensive, but deploying them on NUCs is highly insecure, especially as organizations scale out with distributed branches and edge computing sites. What’s needed is a consolidated hardware solution that combines critical branch, edge, and data center networking functionality with vendor-neutral VM and application hosting, such as the Nodegrid platform from ZPE Systems. Nodegrid’s serial switches and network edge routers run the open, Linux-based Nodegrid OS, which can host your choice of third-party software – including Docker containers – for network performance monitoring, SD-WAN, security, automation, and more. Nodegrid’s versatile, modular hardware solutions also provide out-of-band (OOB) management access to critical remote infrastructure and monitoring solutions, giving teams a lifeline to recover from outages and ransomware attacks. Nodegrid uses innovative, enterprise-grade security features like Secure Boot, self-encrypted disk, and two-factor authentication (2FA), and its onboard software is frequently patched for vulnerabilities to defend against a breach. Deploying Nodegrid at each business site consolidates your network to reduce hardware overhead, streamlining management and enabling easy scalability.

Deploy the best network performance monitoring tools with Nodegrid

Reach out to ZPE Systems to see a demo of how the best network performance monitoring tools run on the Nodegrid platform.
Contact Us

Breaking Down The 2023 Ragnar Locker Cyberattacks

Breaking Down the 2023 Ragnar Locker Cyberattacks

This article was written by James Cabe, CISSP, a 30-year cybersecurity expert who’s helped major companies including Microsoft and Fortinet.

Throughout 2023, several organizations were successfully hit by Ragnar Locker cyberattacks. The affected victims spanned the globe and were forced to shut down much of their critical operations, while the attackers demanded tens of millions of dollars in ransom payments. Despite the group being taken down by law enforcement in October, organizations are re-evaluating their defensive measures — and more importantly, their recovery strategies — to combat these attacks.

If you read my previous articles about the ongoing MOVEit breach and the ransomware that hit MGM, you probably know that isolation is key. It helps you fight through attacks by cutting the kill chain, so that you can restore services quickly without reinfection.

Who Carries Out Ragnar Locker Cyberattacks?

Recent Ragnar Locker cyberattacks were carried out by the Dark Angels Team cybercriminal group. Dark Angels Team’s modus operandi is to breach a company’s defenses, spread laterally, and steal data that can be used to extort the target company. The approach they take involves gaining access to the Windows domain controller, where they deploy ransomware. They encrypt devices using Windows and ESXi encryptors, which gives organizations little recourse aside from taking their critical systems offline in order to stop the spread.

Dark Angels banner

How Do Ragnar Locker Cyberattacks Start?

Ragnar Locker breaches, like all ransomware attacks, require a kill chain that must first be initiated. MITRE ATT&CK defines this as the ‘initial,’ and in these attacks, the initial comes from social engineering. Email stuffing is often the tactic of choice, whereby the attacker sends an email that appears to have a trail of replies or forwards (see the example below). Email trails like this trick spam filters and land directly in the target’s inbox. When an employee clicks a malicious link inside the email, the attack kicks off.

An email showing an example of email stuffing.

Image: Email stuffing is used by marketers and threat actors alike to bypass spam filters.

How Do Companies Discover Ragnar Locker Cyberattacks?

After the Ragnar Locker cyberattack kicks off, the bad link uses Java to load the locker ransomware, then a series of batch scripts installs a payload consisting of virtual box emulation software. This emulation software takes over and encrypts the host, and displays the ransomware message (see image below).

A Ragnar Locker ransomware message shown in a notes file.

Image: A Ragnar Locker ransomware message showing on encrypted devices.

How Do Ragnar Locker Cyberattacks Spread?

The attack spreads by gaining access to Windows domain controllers and then attacking the management interfaces of the VMware ESXi machines. Most organizations don’t properly segment or isolate these management interfaces. This makes them especially vulnerable even to older Babuk ransomware source code that is an ESXi encryptor. Basically, the attackers only need to gain access to the management network, and then they can attack the production network.

From Intel471: “VMware’s ESXi is called a ‘bare metal’ hypervisor because the underlying hardware on which it is installed doesn’t need an operating system. ESXi allows the hardware to be utilized for multiple virtual machines (VMs), which saves on hardware costs. ESXi is a fruitful target for attackers since it may be connected to several VMs and the storage for them. Security experts warn ransomware actors have built specific binaries to target these systems. Groups joining this trend include HelloKitty, Black Basta, Cheerscrypt and GwisinLocker.”

They continue, “Over the last few years, several vulnerabilities have been identified in ESXi, including CVE-2021-21974. The vulnerability is a heap overflow vulnerability within Open Service Location Protocol (OpenSLP), which is a network discovery tool. The vulnerability is remotely exploitable over port 427, and has a Common Vulnerability Scoring System Version 3.0 (CVSSv3) base score of 8.8. It’s suspected that it may be the vulnerability exploited in this attack. VMware said that “significantly out-of-date products” were targeted with vulnerabilities that had been addressed. It affects ESXi versions 7.0 before ESXi70U1c-17325551, 6.7 before ESXi670-202102401-SG and 6.5 before ESXi650-202102101-SG. Due to other vulnerabilities in OpenSLP, VMware disabled OpenSLP starting in 2021 in ESXi versions 7.0 U2c and ESXi 8.0, which is the current version.”

Ultimately, these attacks exploit a combination of a lack of management plane isolation to the VMware management interfaces, specifically on port 427 (OpenSLP), and a lack of patching and updating. Organizations also typically lack a backup authentication mechanism for the control plane, as well as Privileged Access Management, which are both good fallback options.

How Can Companies Stop Ragnar Locker Cyberattacks?

Ragnar Locker ransomware and other attacks are successful because companies don’t employ proper management plane isolation. Attackers can gain access to VMware management interfaces, and then they essentially have the keys to the kingdom. That’s it. No amount of defense can save you.

If you recall CISA’s binding operational directive, they call for an isolated management infrastructure. This is what we refer to as IMI. Rather than serving as a defense, like we think of traditional cybersecurity products, the IMI is an architecture that allows you to fight back. It’s your quick-reaction force, your cavalry, your secret weapon that ensures you always have a counterattack ready to deploy.

IMI is infrastructure that is dedicated — and most importantly, fully isolated from production assets — to ensuring operations can recover quickly from breaches and outages. Here’s a graphical breakdown:

Isolated Management Infrastructure diagram

The IMI includes all of the tools you need for rerouting traffic, decommissioning affected gear, wiping/re-imaging devices, and restoring infrastructure. You can also incorporate automation to speed the process along and make recovery something that happens in minutes or hours at the most. Aside from being completely isolated from production assets, the IMI itself is also segmented and employs zero trust practices. This means that you and only you have access to your secret weapon for cutting the ransomware kill chain.

How Do You Use Isolated Management Infrastructure?

An IMI can host an IRE (Isolated Recovery Environment), which is used to cut off all user data and remote access (except for OOB) to an entire infected site. A properly implemented recovery environment should automate most of these activities to speed up the recovery. One of the first considerations is the requirement for a secondary organization in your IAM that is not attached to normal operations. This is what is known as a set of “Break the Glass” accounts. These are known in military circles but have made it into formal practice as part of a strong playbook for ransomware. Once you do this, you can instantiate selected Zero Trust remote access to the site using credentials that are not in the scope of the attack, and then bring up a communications channel for a virtual war room using software like Rocket Chat, Jitsi, Slack, or other standalone communications tools that are installable on the IRE environment. 

Avoiding normal authentication methods or IAM and normal communication channels is required for the integrity of the recovery and strengthens the recovery playbook. During this time, no email may be used that is associated directly with the organization. Ideally, email should never touch an account that is associated with it either.

The next step is to create a new set of clean side networks that do not directly connect to the main backbone or put it behind another firewall for triage good/bad. Using a sniffer software running on the IRE, the recovery team can then run a passive scan or an active scanner against all machines continuing to try to send email to Exchange/M365. You can give access to people that are deemed good (not sending traffic) but lock off (with an EDR) the ability to open Outlook for a while, while keeping them on the web email. From there, continue working through to find all the sending drivers to see if they have a good backup. If not, back up the infected drive for offline data retrieval for later. Then re-image while scanning the UEFI BIOS during boot (if needed, run an IPMI scan). If the site has a list of assets that are considered crown jewels, prioritize these.

Once you have a segmented “clean side” established with all the network services required to operate the site (DNS, IAM, DHCP), then Internet access can be restored to this site on a limited basis; which means only out-bound communications, nothing in-bound. Restorative operations can continue apace. making sure that the infected side assets are captured in backup for later forensics following chain-of-custody if damages exceeding insurance limits are found to be the case. This is decided in the war room.

Download the Isolated Management Infrastructure Blueprint

Now is the time to lay the groundwork for your IMI so you can fight back against ransomware. Download the Network Automation Blueprint, which gives you a step-by-step guide to building your Isolated Management Infrastructure.

Get in touch with me!

True security can only be achieved through resilience, and that’s my mission. If you want help shoring up your defenses, building an IMI, and implementing a Resilience System, get in touch with me. Here are links to my social media accounts:

Dissecting the MGM Cyberattack: Lions, Tigers, & Bears, Oh My!

Dissecting the MGM Cyberattack

This article was written by James Cabe, CISSP, whose cybersecurity expertise has helped major companies including Microsoft and Fortinet.

The recent MGM cyberattack reportedly caused the company to lose millions in revenue per day. The successful kill chain attack — originally a military tactic used to accomplish a particular objective — granted inside access to the attackers, who encrypted and held for ransom some of MGM’s most prized assets. These ‘crown jewel’ assets, as they’re called in the cybersecurity realm, are most critical to the accomplishment of an organization’s mission. Because ransomware attacks persist in corporate networks until fully cleared, organizations must be ready to “fight through” an attack using resilient systems and effective procedures. This should involve identifying these crown jewels and designing them in a way that ensures they can operate through attacks.

When these types of large-profile attacks occur, many cast their eyes at cybersecurity leaders for failing to fend off the bad guys. The reality is these leaders struggle to get budget, corporate buy-in, and digital assets that are required to build a strong defense for business continuity. For MGM, it’s likely they also faced difficulty operationalizing current assets across a gigantic digital estate, and ultimately lacked a plan to recover from a total outage of crown jewel assets.

From the attacker’s perspective, an exceptional level of intelligence and preparation are required in order to understand a target’s internal operations and architecture and execute a successful kill chain. Successfully attacking a sophisticated organization like MGM requires rapid information stealing to capture and leverage cloud credentials, as well as to lock up those resources and lock out the most important support staff in an organization. This is the crux of the issue: infostealers and ransomware automate the mass grabbing of resources and quickly set up a denial of services for the stakeholders that are responsible for fixing these systems.

How did the MGM cyberattack start? After MGM discovered the breach, how did the attacker stay one step ahead? What approach should organizations take to ensure they can recover if they’re targeted?

Who Started The MGM Cyberattack, and How?

The MGM cyberattack began after an adversary group named “Scattered Spider” used phishing over the phone, an approach called ‘vishing,’ to convince MGM’s customer support rep into granting them access with elevated privileges. Scattered Spider is the same group responsible for the SIM-swapping campaign that happened a few months ago, where they successfully subverted multifactor authentication. Their primary tactic involves social engineering, which they use to steal personal information from employees.  

MGM and many other casinos currently use advanced Zero Trust identity security from Okta. However, the attacker was able to trick the service desk into resetting a password to gain access into the network. Even with newer Zero Trust identity solutions, most organizations unravel once attackers get to the real chewy center” of the network: the humans operating them

Spider Bug Insect graphic

Okta is quoted saying, “In recent weeks, multiple US-based Okta customers have reported a consistent pattern of social engineering attacks against their IT service desk personnel, in which the caller’s strategy was to convince service desk personnel to reset all multi-factor authentication (MFA) factors enrolled by highly privileged users.” Okta further warned, “The attackers then leveraged their compromise of highly privileged Okta Super Administrator accounts to abuse legitimate identity federation features that enabled them to impersonate users within the compromised organization.” 

The MGM cyberattack and those like it are more about processes than technology. Let’s explore how the attack progressed, and how the criminals were successful at staying persistent and ultimately hitting their goal. 

How Did A Simple Authentication Attack Morph Into a Complex Attack?

The Scattered Spider threat actors use a platform written by UNC3944 or AlphaV (known by several names). This is a middleware developer for attack platforms that allow criminals to follow a specific set of instructions (a kill chain) to gain access and ultimately encrypt and exfiltrate data from a targeted company. AlphaV’s platform is called BlackCat, which they use to establish a foothold, establish Command and Control (C2) for the malware, and exfiltrate data, to ultimately get paid.

With elevated Okta privileges at MGM, Scattered Spider deployed a file containing a Java-based remote access trojan, which became a “vending machine” for other remote access trojans (RATs) that sought out other nearby machines to spread quickly. The AlphaV RAT would ‘pwn‘ MGM’s Azure virtual servers to gain access, then sniff for more user passwords and create dummy accounts.  

These RATs leveraged a built-in tool called “POORTRY,” the Microsoft Serial Console driver turned malicious, to terminate selected processes on Windows systems (e.g., Endpoint Detection and Response (EDR) agents on endpoints). AlphaV, the platform maintainer, signed the POORTRY driver with a Microsoft Windows Hardware Compatibility Authenticode signature. This helped the malware to evade most Endpoint Detection software. 

This tool was used to get elevated and persistent access to the Okta Proxy servers that were in the scope of the attack and accessible remotely by the attacker. This attack can evade a lot of detection tools. This access allowed them to capture AM\IAM accounts that allowed them greater access to the organization. This stealing of credentials from the Okta Proxy servers was confirmed by Okta responders as well as the threat actor on their blog. This is called a “living off the land” attack. 

Alphv statement on MGM

How Did MGM Discover the Cyberattack?

The first notification of the hack was dropped on the VXUnderground forums. The staff there verified through chat contact with the threat group UNC3944\AlphaV, who works in conjunction with the Scattered Spider threat actor, The attacker also confirmed this on their blog on the darknets.

On September 11, 2023, anyone attempting to visit MGM’s website was greeted by a message stating that the website was currently unavailable. The attack also stopped hotel card readers, gaming machines, and other equipment critical to MGM’s day-to-day operations and revenue generating activities. 

Screenshot showing MGM casino's website down.

How Did the Attacker Maintain Control?

The initial attack allowed AlphaV, who runs the C2 (Command and Control) networks for the RattyRat trojan, to have remote access to the VMware server farm that services the guest systems, the gaming control platforms, and possibly the payment processing systems. They maintained control despite all of MGM’s attempts to mitigate the problem, because they were able to establish elevated access in places the organization could not easily remove them from without removing access to the whole organization. They established something called “persistence.”

From the attacker’s blog on the darknet, “MGM made the hasty decision to shut down every one of their Okta Sync servers after learning that we had been lurking on their Okta Agent servers sniffing passwords of people whose passwords couldn’t be cracked from their domain controller hash dumps. At this point MGM being completely locked out of their local environment. Meanwhile the attacker continued having super administrator privileges to their Okta, along with Global Administrator privileges to their Azure tenant. They made an attempt to evict us after discovering that we had access to their Okta environment, but things did not go according to plan. On Sunday night, MGM implemented conditional restrictions that barred all access to their Okta (MGMResorts.okta.com) environment due to inadequate administrative capabilities and weak incident response playbooks. Their network has been infiltrated since Friday. Due to their network engineers’ lack of understanding of how the network functions, network access was problematic on Saturday. They then made the decision to ‘take offline’ seemingly important components of their infrastructure on Sunday. After waiting a day, we successfully launched ransomware attacks against more than 100 ESXi hypervisors in their environment on September 11th after trying to get in touch but failing.“

MGM tried many things to remove access into their network. However, because of an advanced attack that installed a shadow identity provider in their own Identity Solution, they were able to maintain access long enough to redeploy access to most of the assets they found to be the backbone of the company. AlphaV was then able to encrypt most of the crown jewels of MGM’s operations network.

Is There a Way to Stop These Types of Attacks? 

The MGM cyberattack required physical reconnaissance, patience, and a lot of planning to set up the kill chain. Playbooks that can protect against this kind of attack are hard to create, because it can mean taking all guest services offline for a period, which requires very high authority in the organization. One of the comments from the attacker was that the organization did not act fast enough to take all remote access offline to their management framework that consisted of Okta Proxy Servers. When they did, the adversary was then able to lock them out by submitting a Multifactor Authentication Reset. To stall the attacker, they would have had to induce a full outage of their crown jewels while a formal assessment of all assets could be performed. Taking assets offline requires buy-in at the board level and executive level, which are difficult to come by even if an organization emphasizes its operational excellence, detection, and defense.

Organizations should have a plan to quickly recover from a total loss of a site, outside of backups (which can be lost) and disaster recovery sites. Organizations need to be properly hard-segmented into a full IMI (Isolated Management Infrastructure). Keeping crown jewels safe from an attacker that targets the chewiest part of an organization should be top of any list going from 2023 budget to 2024 planning.

The following is a light version of what can be done in a fully-automated response that can take mere hours instead of days for an outage (a full operations blueprint will be out in the near future).

Isolated Management Infrastructure diagram

An IMI can host an IRE (Isolated Recovery Environment), which is used to cut off all user data and remote access (except for OOB) to an entire infected site. A properly implemented recovery environment should automate most of these activities to speed up the recovery. One of the first considerations is the requirement for a secondary organization in your IAM that is not attached to normal operations. This is what is known as a set of “Break the Glass” accounts. These are known in military circles but have made it into formal practice as part of a strong playbook for ransomware. Once you do this, you can instantiate selected Zero Trust remote access to the site using credentials that are not in the scope of the attack, and then bring up a communications channel for a virtual war room using software like Rocket Chat, Jitsi, Slack, or other standalone communications tools that are installable on the IRE environment. 

Avoiding normal authentication methods or IAM and normal communication channels is required for the integrity of the recovery and strengthens the recovery playbook. During this time, no email may be used that is associated directly with the organization. Ideally, email should never touch an account that is associated with it either.

The next step is to create a new set of clean side networks that do not directly connect to the main backbone or put it behind another firewall for triage good/bad. Using a sniffer software running on the IRE, the recovery team can then run a passive scan or an active scanner against all machines continuing to try to send email to exchange\M365. You can give access to people that are deemed good (not sending traffic) but lock off (with an EDR) the ability to open Outlook for a while, while keeping them on the web email. From there, continue working through to find all the sending drivers to see if they have a good backup. If not, back up the infected drive for offline data retrieval for later. Then reimage while scanning the UEFI BIOS during boot (if needed, run an IPMI scan). If the site has a list of assets that are considered crown jewels, prioritize these.

Once you have a segmented “clean side” established with all the network services required to operate the site (DNS, IAM, DHCP), then Internet access can be restored to this site on a limited basis; which means only out-bound communications, nothing in-bound. Restorative operations can continue apace. making sure that the infected side assets are captured in backup for later forensics following chain-of-custody if damages exceeding insurance limits are found to be the case. This is decided in the war room.

Get the Blueprint for Isolated Management Infrastructure

Maintaining control of critical systems is something security practitioners deal with in the Operational Technology (Industrial Control Systems) side of an organization. For them, the critical and most impactful part of the problem is the loss of control rather than the loss of data, a problem highlighted by the MGM cyberattack. Operational Technology Safety and Security teams set up and maintain Safety Systems as a fallback measure in case of any kind of disaster. This automation allows fallback of services safely, from which point they can recover operations. In 2023, most of our business is done on computers and networks. It is how to plan for business continuity. Now is the time that IT started following this safety system blueprint as well. 

Download the Network Automation Blueprint now, which helps you lay the groundwork for your IMI so you can recover from any attack.

Get in touch with me!

True security can only be achieved through resilience, and that’s my mission. If you want help shoring up your defenses, building an IMI, and implementing a Resilience System, get in touch with me. Here are links to my social media accounts: