What types of AI projects does Pharosyne work on?

Pharosyne specializes in multi-agent systems, RAG (Retrieval-Augmented Generation) architectures, voice AI agents, and custom AI/ML solutions. The focus is on enterprise-grade implementations that deliver measurable business value. With 19+ years of software engineering experience and leadership roles at companies like BBVA, Santander, and El Corte Inglés, Pharosyne brings proven expertise to complex AI challenges.

Does Pharosyne offer fractional CTO or advisory services?

Yes, Pharosyne provides fractional CTO and Head of AI services for companies that need strategic technical leadership without a full-time commitment. This includes architecture guidance, team mentoring, and technology roadmap development.

What is the typical engagement model?

Pharosyne offers flexible engagement models including project-based consulting, retainer arrangements, and fractional leadership roles. Each engagement starts with a discovery call to understand your needs. You work directly with senior architects, no account managers or intermediaries.

Does Pharosyne work with startups or only enterprises?

Pharosyne works with both. The sweet spot is mid-market companies and growth-stage startups with complex technical challenges. Pharosyne also supports enterprises looking to modernize their AI and software architecture.

What is RAG architecture and when should I use it?

RAG (Retrieval-Augmented Generation) combines large language models with your own data sources to provide accurate, contextual responses. It's ideal for building internal knowledge bases, customer support systems, or any AI application that needs access to proprietary information. Pharosyne has implemented RAG systems for enterprise clients handling millions of documents.

How do multi-agent AI systems work?

Multi-agent systems use multiple AI agents working together to solve complex tasks. Each agent specializes in a specific function, and they collaborate to achieve goals that would be too complex for a single agent. Pharosyne designs and implements these systems for workflow automation, research pipelines, and autonomous business processes.

What languages and technologies does Pharosyne use?

Pharosyne works primarily with TypeScript, Python, React, and Next.js for full-stack development. For AI/ML, the team uses OpenAI, Anthropic Claude, LangChain, and custom model implementations, with deep experience in vector databases like Qdrant, Weaviate, and pgvector for RAG systems.

Does Pharosyne work remotely or on-site?

Pharosyne works primarily remotely from the EU (Spain), serving clients globally. For key workshops or critical project phases, on-site engagements within Europe are available. Remote work allows focus on delivery while maintaining flexible communication across time zones.

Multi-agent systems: when they make sense and when they don't

A fintech Pharosyne worked with a while back had spent over €150k on a customer service chatbot. They'd been iterating for months. The bot worked great in demos, but in production it invented return policies that didn't exist and gave wrong prices. Not always, maybe 25-30% of the time, but enough that the support team hated the thing.

The model wasn't the problem. They were using one of the leading models at the time, well-configured. The problem was that a single agent was trying to do too many things: query the catalog, check stock, calculate prices with discounts, handle returns, and answer general questions. Too much context, too many conflicting instructions.

The solution Pharosyne proposed was to split the work. Instead of one bot that knew everything, five smaller agents that knew a lot about a little. One for catalog, one for pricing, one for logistics, one for returns, and an orchestrator that decided who to ask. Errors dropped significantly, though I don't have an exact number because the metric changed midway through the project.

What a multi-agent system actually is (no buzzwords)

Think about a hospital. There's no single doctor who does everything. There are specialists: cardiologists, radiologists, surgeons. And there's triage, which decides which specialist to send each patient to.

A multi-agent system works similarly. Instead of a giant LLM with a massive prompt, you have specialized agents that master specific tasks. And something that coordinates who does what.

The typical components:

Orchestrator: Receives the request, decides which agent or agents need to act, and combines the results. It doesn't do the real work, just directs traffic.

Specialized agents: Each has its own prompt, its own tools, access to specific data. The inventory agent knows how to query the stock database. The pricing agent has access to the pricing API. Each is an expert in its domain.

Shared memory: A place where agents leave information for others. The catalog agent finds the product, pricing adds the cost, shipping calculates delivery. They don't talk directly, but they share context.

Tools: Functions that agents can execute. Call APIs, query databases, send emails. Without tools, an agent is just a text generator.

When NOT to use multi-agents

Here's what nobody tells you: recent research suggests that in many cases a single well-configured agent outperforms multi-agent systems.

A 2025 study found that in environments with more than 10 tools, multi-agent systems suffer an efficiency penalty of 2x to 6x compared to individual agents. That's significant.

The folks at Cognition, who created Devin, say it clearly: in 2025, running multiple agents in collaboration results in fragile systems. Their recommendation is to start with a linear agent where context is continuous.

Don't use multi-agents when:

The task can be resolved in a single logical pass. Summarizing documents, classifying tickets, extracting data from invoices. A single well-configured agent with good RAG is enough.

You have many tools. Counterintuitive, but true. With more than 10 tools, coordination between agents adds overhead that doesn't pay off.

Your individual agent already works reasonably well. Research suggests that if your single agent exceeds 45% accuracy on the task, adding more agents probably won't improve things. Sometimes it makes them worse.

Tasks are primarily write operations. Read operations parallelize well. Write operations create coordination problems.

You're a small team. Multi-agents requires monitoring, distributed debugging, integration tests. If there are two of you, start simple.

When it DOES make sense

It makes sense when:

The problem crosses multiple domains with different rules. An e-commerce assistant that handles catalog, payments, shipping, and returns. Each area has its own logic, its own data, its own exceptions.

Tasks have clear dependencies and multiple passes. First search product, then verify stock, then apply discounts, then calculate shipping. When order matters and each step needs information from the previous one.

You need strict auditing. In banking, insurance, or any regulated environment, knowing exactly which decision was made by whom is mandatory. With multi-agents you can trace each step.

Different teams maintain different parts. If the pricing team changes their rules every week and logistics every month, having each own their agent makes it easier to iterate without breaking each other.

The real latency numbers (and the complexity nobody tells you)

This is more complicated than it seems. Latency comes from many places, not just model generation.

LLM Infrastructure:

Where you call the model from matters. If your server is in Europe and you're using a US endpoint, you add 80-150ms of network latency alone, on each call. And in multi-agent you make many calls. Pharosyne has seen systems where 30% of total latency was just transatlantic round-trips.

The provider's own infrastructure adds variability. During peak hours you can go from 200ms time-to-first-token to 800ms or more. This multiplies in multi-agent.

Context Management:

Each agent needs context. How you compress it, how much you keep, how you pass it between agents, it all adds up. Pharosyne has seen systems where state serialization and deserialization between agents added 50-100ms per hop.

If you use shared memory with persistence, add database latency. If you use cache, you need to manage invalidation. If you compress conversations to stay within token limits, that compression has cost.

Inter-agent Communication:

If agents pass messages to each other, each communication step has overhead. Response parsing, format validation, error handling, retries. In a 5-agent system with an orchestrator, you easily have 8-10 calls between components for each user request.

The numbers I see in production:

Latency breakdown by component

Network to LLM provider30-150ms

Depends on geography

Time to first token200-800ms

Depends on provider load

Full generation2000-20000ms

The bulk of time

Orchestrator (routing)50-200ms

Vector search5-300ms

5ms with hot cache

State serialization20-100ms

Per agent

Validation and parsing10-50ms

Per response

Total overhead (excluding LLM)285-1450ms

Adding it up: a simple request through 3 agents can easily take 8-15 seconds. A complex request with 5 agents and multiple passes, 30 seconds or more.

If you need responses in under 2 seconds, multi-agent architecture is probably not for you. And if you think you can optimize it later, think twice. Complexity grows exponentially with each agent you add.

Implementation approach

Pharosyne doesn't use frameworks. Anthropic says it well in their documentation: the most successful implementations don't use complex frameworks or specialized libraries. They build with simple, composable patterns.

Frameworks add abstraction. Abstraction hides what's happening. In production you need to see exactly what's hitting the API. More code to write, yes, but much easier to debug.

The basic pattern is an orchestrator with routing:

8-10 calls between components per request

The orchestrator is another LLM call with a specific prompt to classify and route. Each agent is its own call with its own system prompt and tools. Shared memory is usually a dictionary or store you pass between calls.

It's not magic. It's basic software engineering applied to API calls. The real work is in designing good prompts, defining clear tools, and above all in error handling and observability.

Evaluations: this is not traditional testing

This is where most people get lost. You think you can test a multi-agent system like you test traditional software. Unit tests, integration tests, end-to-end. It doesn't work that way.

With LLMs you don't have determinism. The same input can give different outputs. A test that passes today can fail tomorrow without you changing anything. And when you have multiple agents, variability multiplies.

What you need are evals, not tests.

Evals are continuous evaluations against representative datasets. They don't verify that output is exactly X, they verify that output is "good enough" according to defined criteria. Accuracy, relevance, absence of hallucinations, correct format, appropriate tone.

Why it's more complex than traditional testing:

In classic software, a test fails or passes. With LLMs you have gradients. A response can be 80% correct. Or correct but poorly formatted. Or correct but with inappropriate tone. Defining what's "good enough" is a problem in itself.

In multi-agent it gets more complicated. If the final result is bad, which agent failed? Did the orchestrator route incorrectly? Did an intermediate agent corrupt context? Did combining responses lose information? You need evals at the individual agent level and at system level.

What Pharosyne does in production:

Evaluation datasets per agent. Minimum 50-100 representative cases per agent, with expected outputs or evaluation criteria.

Automated evals in CI. Every prompt change triggers evaluation against the dataset. If accuracy drops below threshold, it doesn't deploy.

LLM-as-judge for complex cases. Using another model to evaluate whether a response is correct when there's no "exact" answer. It has its problems, but scales better than human review.

Continuous monitoring in production. Evals don't end at deploy. Sampling real requests, offline evaluation, alerts when metrics degrade.

The cost nobody mentions:

Building a good eval system can take more time than building the multi-agent system itself. Pharosyne has seen projects where 40% of effort went into evaluation and observability. But without that, you don't know if your system works. You just hope it works.

Common mistakes

Starting with too many agents. The temptation is to model the entire organization from day one. Start with two, three max. Add when you have a real problem to solve, not before.

Not defining contracts. Each agent needs clear specification: what it receives, what it returns, when it fails. Without this, when something breaks, debugging is impossible.

Ignoring observability. A multi-agent system without structured logs and tracing is a black box. You need to be able to reconstruct what happened when something fails.

Underestimating costs. Each agent is another LLM call. Pharosyne has seen bills double or triple after migrating to multi-agent. Budget from the start.

Depending on frameworks. When the framework updates, or stops being maintained, or has a production bug, you're trapped. With your own code on the API, you have full control.

Over-engineering. Models improve fast. What needs three agents today might be done by one in six months. Don't build for problems you don't have yet.

Next step

If you have processes that today depend on humans doing repetitive work across multiple systems, multi-agent systems can help. They're not magic. They're software engineering applied to LLMs, with their tradeoffs.

The recommendation: start with a single well-made agent. Measure where it fails. Only when you have clear data that the simple architecture doesn't scale, consider splitting into multiple agents.

For a review of your case, get in touch. As part of Pharosyne's consulting services, the team can assess whether multi-agent makes sense for your situation and where to start. Learn more about Pharosyne's experience designing these systems for enterprises.