Over 14,000 Ollama server instances are publicly accessible on the internet right now. A recent Cisco analysis found that 20% of these actively host models susceptible to unauthorized access. BankInfoSecurity separately reported discovering more than 10,000 Ollama servers with no authentication layer, the result of hurried AI deployments by developers under pressure to deliver.
This is shadow IT reborn for the AI era. Developers are spinning up local LLM servers for productivity gains, often unaware they have exposed sensitive infrastructure to the internet. And Ollama is just one of dozens of AI serving platforms proliferating across enterprise networks.
For CISOs, the security question has fundamentally shifted. It is no longer “are we running AI?” The urgent question is: “Where is AI running that we don’t know about?”
The Visibility Gap
Shadow AI presents a more insidious challenge than traditional shadow IT. When shadow IT first emerged as a concern, the threat was relatively straightforward: employees spinning up unauthorized SaaS applications or cloud instances. Security teams developed tooling and processes to detect and govern these deployments.
Shadow AI is different. Developers are not just subscribing to third-party AI services. They are deploying inference servers locally, often on workstations or internal servers that never appear in cloud asset inventories. A developer experimenting with Ollama on their laptop might inadvertently bind it to all network interfaces. A team testing LiteLLM as a unified gateway might deploy it without authentication. A data science group might spin up vLLM instances on GPU servers to accelerate research.
None of these scenarios involve malicious intent. All of them create security blind spots that traditional vulnerability scanners will not flag.
Why This Matters To Your Organization
Unsecured LLM endpoints create multiple attack vectors. First, there is the data exposure risk. LLM servers often retain conversation history, system prompts, and fine-tuning data. An attacker who discovers an unauthenticated endpoint can enumerate deployed models and potentially extract whatever data those models were trained on.
Second, these endpoints can serve as pivot points for lateral movement. An internal LLM server with network access to other systems creates opportunities that traditional threat models do not anticipate.
Third, there is the compliance dimension. Unauthorized AI deployments may implicate data residency requirements, model governance obligations, or audit trail expectations. The fact that a deployment was “unofficial” does not shield the organization from regulatory consequences.
Practical Steps For Security Teams
Addressing shadow AI requires extending existing capabilities while developing new detection methods.
First, update your threat model. If your organization employs developers, and especially data scientists or ML engineers, assume local LLM deployments exist. The convenience of tools like Ollama makes experimentation trivially easy.
Second, extend your discovery capabilities to include AI-specific services. Ollama defaults to port 11434. vLLM typically runs on 8000. LM Studio uses 1234. Gradio interfaces appear on 7860. These are not ports that traditional scanners prioritize.
Third, develop fingerprinting capabilities for AI services. Detecting that a port is open is not enough. You need to identify what is actually running. Different LLM platforms have different API signatures, response patterns, and identifying characteristics.
Introducing Julius: Open-Source Llm Service Fingerprinting
To help the security community tackle this challenge, Praetorian has released Julius as open-source tooling under the Apache 2.0 license. Julius is a lightweight LLM service fingerprinting tool that detects 17+ AI platforms through active HTTP probing.
Julius answers the question “is this HTTP service an LLM?” during penetration tests and attack surface assessments. Built in Go, it compiles to a single binary with no external dependencies. It uses a probe-and-match architecture with specificity scoring to eliminate false positives. If Julius detects both “OpenAI-compatible” and “LiteLLM” on the same endpoint, it reports the more specific match first.
The tool detects self-hosted servers like Ollama, vLLM, LocalAI, and Hugging Face TGI. It identifies proxy services like LiteLLM and Kong. It finds chat frontends like Open WebUI, LibreChat, and SillyTavern. And it is extensible: adding support for a new service requires approximately 20 lines of YAML configuration.
Julius is the first release in Praetorian’s “12 Caesars” initiative, a commitment to releasing one open-source security tool per week for twelve weeks.
Julius is available now at https://github.com/praetorian-inc/julius
The Path Forward
Shadow AI is not a future concern. It is a present reality. Every week that passes without visibility into your AI infrastructure is a week where unknown LLM endpoints might be exposing sensitive data or accepting unauthenticated requests.
The security community solved shadow IT through a combination of technology, policy, and cultural change. We need the same approach for shadow AI, but we need to move faster. The deployment velocity of AI infrastructure far exceeds what we saw with cloud services a decade ago.
The question is not whether your organization has shadow AI. The question is whether you have found it yet.
About the Author
Evan Leleux is a Software Engineer at Praetorian focused on building scalable, distributed systems for enterprise security operations. He is a Georgia Tech alumnus with expertise in offensive security tooling and AI infrastructure assessment.
Evan can be reached online at https://github.com/praetorian-inc and at the company website https://www.praetorian.com/
