Fethi Uluak Fethi Uluak

The Overlooked Side of Enterprise AI: Keeping Systems Working Over Time

Most enterprise AI conversations focus on getting to the first working version. The demo matters. The pilot matters. The moment when the system produces a good answer matters. But in practice, that’s the easy part. The real challenge shows up later, usually quietly, when the system is still running but no longer doing quite what people think it’s doing.

Most enterprise AI conversations focus on getting to the first working version. The demo matters. The pilot matters. The moment when the system produces a good answer matters. But in practice, that’s the easy part. The real challenge shows up later, usually quietly, when the system is still running but no longer doing quite what people think it’s doing.
Enterprise AI rarely breaks all at once. It drifts.

“It Works” Is a Moment, Not a State

An AI system working today doesn’t mean it will work the same way next quarter. Unlike traditional software, many AI systems don’t have a fixed definition of correctness. They operate probabilistically, depend on context, and inherit assumptions from their environment.
At launch, those assumptions are fresh. Prompts reflect current language. Documents match how teams work. Policies align with organizational reality. Over time, all of that changes, but the system often doesn’t. The result is a system that still responds, still generates output, still looks alive but is slowly becoming less useful.

Prompt Drift Is Real

Prompts are not static assets, even when the text doesn’t move. Their meaning depends on surrounding context: the model version, the input distribution, and the task expectations of users. As teams adopt the system, they start using it in ways the original prompts didn’t anticipate. Edge cases become common cases. The language people use shifts. New shortcuts appear. What once guided the model cleanly now produces uneven results.
Because nothing is technically broken, prompt drift is easy to ignore. Output quality degrades just enough to cause friction, not enough to trigger alarms.

Organizations Change Faster Than Systems

Enterprise AI systems are built around an implicit snapshot of the organization. Team structures, approval flows, ownership boundaries, and responsibilities are all baked into prompts, tools, and retrieval logic.
Then the org changes. A team is renamed. A responsibility moves. A workflow gets split. A policy owner changes. Humans adapt immediately. AI systems don’t. They keep reflecting a version of the company that no longer exists.
This is where trust erosion begins. The system isn’t wrong in an obvious way it’s outdated in subtle ones. And subtle wrongness is hard to diagnose.

When Policies Change, Assumptions Break

Policy updates are especially dangerous for AI systems, because they often invalidate things that were previously safe shortcuts. What used to be implied now needs to be explicit. What used to be allowed now needs exceptions.
In RAG-based systems, old documents don’t announce that they’re obsolete. They just sit there, quietly retrievable. Unless there’s active pruning, versioning, or weighting, the system will happily surface outdated guidance alongside current rules. From the outside, everything looks normal. Inside, the system is blending past and present into answers that feel confident and complete and are occasionally wrong in exactly the ways that matter most.

Silent Degradation Is the Default Failure Mode

Traditional software fails loudly. AI systems tend to fail politely. They still answer. They still generate fluent text. They still pass casual testing. What changes is alignment with reality. Small inaccuracies compound. Edge cases multiply. Users start double-checking “just in case,” which is often the first sign that trust is slipping.
By the time someone says “this doesn’t work anymore,” the system may have been decaying for months.

Why RAG Makes This Easier to Miss

RAG systems are particularly prone to quiet decay. Knowledge bases grow, but rarely shrink. Documents get updated, but old versions linger. Retrieval quality changes as embeddings evolve and distributions shift. Unless teams are regularly testing retrieval outcomes, not just generation quality. They won’t notice when the system starts pulling the wrong context. And once the wrong context is in play, even a strong model will produce misleading answers. The system doesn’t need new bugs to get worse. It just needs time.

Maintenance Is a Product Decision, Not a Cleanup Task

The biggest misconception is treating AI maintenance as an engineering afterthought. In reality, it’s a product commitment. Someone needs to own freshness, relevance, and correctness as first-class concerns.
That means: Reviewing prompts as living artifacts. Pruning and versioning knowledge sources. Re-evaluating assumptions when orgs or policies change. Measuring output quality continuously, not just at launch
The teams that plan for this early don’t avoid decay entirely, but they notice it sooner and correct it faster.

The Uncomfortable Truth

Enterprise AI systems don’t age like software. They age like organizations. Slowly, unevenly, and in ways that are hard to quantify. The hard part isn’t making AI intelligent. It’s keeping it aligned with a moving target. Teams that recognize this early build systems that last longer, fail more gracefully, and retain trust even as everything around them changes.
Maintenance may not be the most exciting part of enterprise AI, but it’s the part that determines whether a system remains useful or simply continues to answer.
Read More
Fethi Uluak Fethi Uluak

The Role of Workflow Integration in Enterprise AI Adoption

When an enterprise AI system fails, the postmortem usually starts in the wrong place. The model wasn’t accurate enough. The prompts weren’t good. The hallucinations were unacceptable. Sometimes those things are true, but more often they’re a distraction. In practice, most enterprise AI systems don’t fail because the intelligence is weak. They fail because the system lives outside the way people already work.
When an enterprise AI system fails, the postmortem usually starts in the wrong place. The model wasn’t accurate enough. The prompts weren’t good. The hallucinations were unacceptable. Sometimes those things are true, but more often they’re a distraction. In practice, most enterprise AI systems don’t fail because the intelligence is weak. They fail because the system lives outside the way people already work.
Enterprises are not short on capable models. They are short on AI systems that feel inevitable to use.

The Silent Killer: “Going Somewhere Else”

The fastest way to kill adoption is to make people leave their workflow.
If using AI means opening a separate tool, switching tabs, pasting context, asking a question, then returning to the original system, usage drops off sharply after the novelty wears off. This is true even if the AI is objectively helpful. The friction doesn’t feel dramatic in a demo, but it compounds in real work.
People don’t think in terms of “I need AI now.” They think in terms of “I need to answer this email,” “close this ticket,” or “finish this pull request.” Any tool that doesn’t meet them inside that moment is already at a disadvantage.
This is why many internal AI tools quietly die. Not because they’re bad, but because they require intent. And intent is expensive.

Adoption Is About Placement, Not Capability

A recurring pattern in successful enterprise AI systems is that users don’t experience them as destinations. They experience them as steps. The AI shows up: while writing an email, not before it. Inside a ticket, not in a separate chat. Next to code, not above it.
In these systems, the AI doesn’t ask for attention. It offers momentum. That distinction matters more than raw intelligence.
This is also why extremely simple AI features often outperform sophisticated standalone tools. An average suggestion in the right place beats a brilliant answer in the wrong one.

Assistants vs Copilots (And Why the Difference Matters)

The word “assistant” has done a lot of damage in enterprise AI. Assistants imply delegation: you ask, they answer, you decide what to do next. That interaction model works for exploration and learning, but it doesn’t map well to operational work. Most enterprise tasks aren’t about asking questions, they’re about moving something forward.
Copilots behave differently. They assume the user is already doing the work. The AI’s job is not to replace intent, but to reduce effort. It drafts, suggests, fills, flags, or summarizes and always in service of the current task.
This is why copilots embedded in IDEs, email clients, or CRMs tend to see sustained usage, while generic internal chatbots plateau. One is part of the workflow. The other is an optional side conversation.

Why “Internal GenAI” Products Stall

Many enterprises start their AI journey by building an internal GenAI style interface trained on company data. It looks impressive. It demos well. Early usage spikes. Then it flattens.
The reason is simple: chat interfaces are destinations. They require users to translate work into questions. That translation cost never goes away. Over time, users learn when the tool is helpful and when it’s not. They stop opening it reflexively. It becomes a tool of last resort instead of a default behavior.
This doesn’t mean these systems are useless. It means they are mispositioned. Chat is a poor primary interface for most enterprise work. It’s better suited as a fallback, not a foundation.

Enterprise Work Is a Chain, Not a Conversation

Most enterprise workflows look less like conversations and more like pipelines. Information enters, gets transformed, reviewed, approved, and passed along. Each step has constraints, context, and consequences.
AI systems that succeed respect that structure. They don’t ask users to explain the entire world every time. They inherit context from the step they’re embedded in and operate within clear boundaries. When AI is designed as a conversational endpoint, it floats above the system. When it’s designed as a workflow component, it becomes part of the machinery.
That difference determines whether the system feels optional or unavoidable.

Reframing the Real Problem

When an AI feature isn’t adopted, the instinct is to make it smarter. More context. Better prompts. A larger model. But intelligence rarely fixes placement. The better question is not “Why isn’t the model good enough?” but “At what exact moment should this appear?” If the answer is vague, adoption will be too.
Enterprise AI succeeds when it reduces steps, not when it adds insight. The most valuable systems don’t feel like AI products at all. They feel like the software finally learned how the work actually happens.
That’s why most enterprise AI failures are workflow failures. And why the teams that understand this spend less time tuning models, and more time deciding where the intelligence belongs.
Read More
Fethi Uluak Fethi Uluak

Helping AI Systems Talk to the Outside World

As AI systems move beyond answering questions and start doing things, booking flights, checking inventory, querying databases, they run into a practical limitation. Large language models are text-based, and most real-world systems are not. Something needs to sit in between and translate intent into action.
There isn’t a single best answer here, and the landscape is still evolving. What exists today reflects different assumptions about how AI agents should interact with software, and it’s likely that tomorrow will bring more options or refinements.

Why Tools Exist at All

Even with large context windows, models can’t hold everything. Entire databases, live systems, and constantly changing data simply don’t fit. Rather than forcing everything into context, most production setups give models the ability to ask for what they need, when they need it.
In that setup, the model becomes more of an orchestrator. It reasons about a task, decides what information or action is required, and then reaches out to an external system to get it. The question isn’t whether this is needed, it’s how that connection should work.

An AI-Native Way to Describe Capabilities

One approach that’s emerged recently is to describe tools in a way that language models can understand directly. Instead of exposing only technical method signatures, the system also provides natural language descriptions of what a tool does and when it might be useful.
This makes it easier for an AI agent to discover capabilities at runtime. The agent doesn’t need to be hard-coded with knowledge of every service in advance. It can ask what’s available, read the descriptions, and adapt. That flexibility can be helpful in environments where tools change frequently or where agents are expected to operate more autonomously.
The trade-off is that this approach tends to favor readability and adaptability over raw performance. It’s designed first for understanding, not throughput.

A More Traditional, High-Performance Path

Long before AI agents entered the picture, distributed systems already had a solution for fast, reliable service communication. Remote procedure call frameworks were built to move structured data efficiently between services, often at very high scale.
These systems excel at speed and reliability. They’re well understood, widely deployed, and optimized for predictable interactions. What they don’t provide out of the box is semantic guidance. An AI agent can see what methods exist, but not why or when it should call them. Bridging that gap usually requires an additional translation layer that maps natural language intent to specific calls. This isn’t a flaw so much as a reflection of different design goals.

Different Assumptions, Different Strengths

The contrast between these approaches is less about which is better and more about what each assumes. Some protocols assume the caller already knows exactly what it wants to do. Others assume the caller needs help figuring that out.
In practice, many systems already mix these ideas. A language-friendly interface might help an agent discover what’s possible, while a more traditional protocol handles the heavy lifting once the decision is made. As workloads grow, performance characteristics start to matter more. As systems become more dynamic, discoverability starts to matter more.

A Moving Target

It’s worth being cautious about drawing hard conclusions here. Agent architectures are still young, model capabilities are changing quickly, and infrastructure patterns are adjusting in response. What feels like the right abstraction today may look different a year from now as models get better at reasoning, context windows grow, or new standards emerge.
For now, the most useful mindset seems to be pragmatic rather than prescriptive. These protocols are tools, not ideologies. Each solves a slightly different problem, and many real systems will likely end up using more than one.
The important part isn’t picking the “winning” approach. It’s understanding what assumptions you’re making, and being ready to revisit them as both the technology and the needs around it continue to change.
Read More
Fethi Uluak Fethi Uluak

Two Practical Ways of Giving AI Access to Knowledge

Large language models are capable, but they’re not aware of information outside their training data. In enterprise settings, that limitation shows up quickly. Internal documents, recent updates, and proprietary data are exactly the things models don’t have by default. Most production systems end up solving this not by changing the model, but by deciding how and when to provide it with additional context.
There are a couple of common patterns teams use today. Neither is universally better, and both come with trade-offs that only really show up once systems are in use.

Retrieval as a Way to Stay Current

One approach is to retrieve small pieces of relevant information at the moment a question is asked. Instead of loading everything the organization knows into the model, the system tries to identify what might matter for this specific request and passes only that context along.
This tends to work well when knowledge is large or frequently updated. Documents can change without retraining models, and the system stays relatively flexible. The trade-off is that the model’s answer is only as good as the information that was retrieved. If something important is missed, the model has no way to compensate.
In practice, retrieval systems don’t usually fail dramatically. They drift. Answers remain fluent, but occasionally feel incomplete or slightly off. Those are often retrieval issues, not model issues, but they can be hard to spot without deliberate testing.

Loading Knowledge Up Front

Another approach teams experiment with is loading a fixed body of knowledge directly into the model’s context. From the model’s perspective, everything it needs has already been read. There’s no search step at question time, which simplifies the request path and can reduce latency.
This works best when the knowledge set is relatively small and stable. Manuals, reference guides, or internal playbooks often fall into this category. The downside is that updates require reprocessing, and scale is limited by how much the model can reasonably hold in context.
When this approach struggles, it’s usually because information changes more often than expected, or because the amount of material slowly grows beyond what was originally planned.

Different Trade-Offs, Same Goal

Both patterns are trying to solve the same problem: giving a model access to information it wasn’t trained on. The difference is where the system places responsibility.
Retrieval systems decide what’s relevant before the model sees it. Context-heavy systems ask the model to decide relevance on its own. Neither approach removes uncertainty entirely; they just move it to different parts of the system.
For many teams, the right answer ends up being a mix of both. A system might retrieve a focused set of documents, then treat that set as temporary working memory for follow-up questions. This keeps the search space manageable without forcing the model to reprocess everything repeatedly.

Choosing What Fits the Situation

The decision between these approaches is less about technical correctness and more about operational fit. How often does the knowledge change? How large is it? How sensitive is the output to missing or outdated information? How much complexity can the team realistically maintain?
These questions tend to matter more than the specific technique. Both retrieval-based and context-heavy systems can work well when they’re aligned with the shape of the problem they’re trying to solve.
As models improve and context windows grow, it’s likely teams will continue experimenting with both. The important part isn’t picking the “right” pattern up front, but understanding how each one behaves over time and adjusting as the system, and the organization around it evolves.
Read More
Fethi Uluak Fethi Uluak

When to Use RAG in Enterprise AI Systems

RAG (Retrieval-Augmented Generation) has become one of those terms that shows up in almost every enterprise AI conversation. Need access to internal documents? Add RAG. Need the model to be “grounded”? Add RAG. Over time, it’s started to sound less like an architectural choice and more like a default checkbox.
RAG (Retrieval-Augmented Generation) has become one of those terms that shows up in almost every enterprise AI conversation. Need access to internal documents? Add RAG. Need the model to be “grounded”? Add RAG. Over time, it’s started to sound less like an architectural choice and more like a default checkbox.
In some cases that might be a mistake.
RAG is a useful pattern, but it’s not neutral. It introduces trade-offs in reliability, cost, latency, and system complexity. Used in the right place, it quietly does its job. Used in the wrong place, it creates systems that are hard to debug, hard to trust, and expensive to operate. Understanding where that line is matters more than knowing how to set up a vector database.

What RAG Actually Solves

At its core, RAG exists to solve one specific problem: large language models don’t have access to your private or up-to-date data. Retrieval is simply a way to inject that data into the model at the moment it needs to answer a question.
This works well when the knowledge you care about lives in documents, changes over time, and needs to be referenced rather than memorized. Policies, internal wikis, product documentation, and legal text fall neatly into this category. In these cases, RAG acts like a just-in-time reading mechanism. The model doesn’t need to “know” your documentation; it only needs to read the relevant parts before responding.
That distinction is important, because it also defines the limits of the approach.

Where RAG Fits Naturally

RAG performs best when the system’s main job is to surface and explain existing information. If the expected output looks like a well-written summary, explanation, or answer grounded in source material, retrieval adds real value. It allows the system to stay current without retraining, makes updates operational rather than technical, and enables traceability, something enterprises care deeply about.
Another signal that RAG is a good fit is when answers are allowed to vary slightly in phrasing but not in substance. The model is synthesizing, not deciding. When users want to see why something is true and where it came from, RAG aligns well with that expectation.

Where RAG Starts to Break Down

Problems appear when RAG is used to support logic-heavy or decision-critical systems. Retrieval is probabilistic. Chunking, embedding quality, ranking, and context limits all introduce uncertainty. That uncertainty is manageable when the model is summarizing a policy, but it becomes dangerous when the model is deciding eligibility, pricing, or risk.
In those cases, the system isn’t failing loudly, it’s failing subtly. The model may retrieve most of the right context, miss one key clause, and still produce a confident answer. From the outside, it looks grounded. Under the hood, it’s inconsistent.
RAG also struggles when the knowledge base is small, stable, and central to the product. If the entire business logic fits on a few pages, introducing embeddings and retrieval layers often creates more surface area for errors than value. A well-structured prompt or deterministic logic will usually outperform a retrieval pipeline in both reliability and maintainability.
Latency and cost are another quiet tax. Every RAG call adds retrieval time, additional tokens, and more infrastructure. In high-volume or real-time systems, these costs compound quickly and are hard to claw back later.

The Hallucination Myth

One of the most common arguments for RAG is that it “prevents hallucinations.” In practice, it doesn’t prevent them, it changes their shape.
Good retrieval reduces the chance of the model inventing facts, but poor retrieval produces answers that sound authoritative and cite the wrong context. That can be worse than a visible hallucination, because it gives users false confidence. RAG systems don’t eliminate the need for evaluation; they raise the bar for it.

What to Use Instead (or Alongside)

Many enterprise use cases don’t need retrieval at all. Carefully designed prompts, examples, and constraints can go surprisingly far, especially for workflow assistance, content generation, and structured reasoning tasks. This approach is easier to debug, cheaper to run, and often more predictable.
Fine-tuning is another option, but it solves a different problem. Fine-tuning is about shaping behavior. Tone, style, reasoning patterns, not about injecting fresh knowledge. It works best when the information is stable and the desired output needs to be consistent. When facts change frequently, fine-tuning becomes operationally expensive and brittle.
The most robust systems tend to be hybrid. Hard rules and structured logic handle decisions. Retrieval provides reference material. The language model focuses on explanation, synthesis, and interaction. This separation keeps critical logic deterministic while still benefiting from the flexibility of natural language generation.
For structured or numerical data, traditional databases and APIs remain the right tool. Vector search is not a replacement for SQL. Let the model call tools instead of guessing over text when precision matters.

A More Useful Way to Think About RAG

Instead of asking “Should we use RAG?”, a better question is: What kind of mistakes can this system afford to make?
If the worst-case error is a slightly imperfect explanation, RAG is often fine. If the worst-case error is a wrong decision that looks justified, RAG should not be the core of your system.
RAG is infrastructure, not a feature. Users don’t care how the answer was assembled—they care whether it’s correct, consistent, and fast. The strongest enterprise AI systems are usually the ones that resist architectural fashion and choose the simplest setup that can reliably meet those expectations.
When RAG fits, it feels invisible. When it doesn’t, no amount of prompt tuning will save it.
Read More
Fethi Uluak Fethi Uluak

Empathy in the Age of Algorithms

AI can read a thousand conversations and still completely miss what it was like to actually be in one. 
AI can read a thousand conversations and still completely miss what it was like to actually be in one. It can pick up on tone, flag emotions, even write an apology that sounds convincing. But it has no idea what it means to actually care about someone.
Empathy isn't just spotting patterns in what people say. It's being there. It's the pause before you respond because you're trying to really hear what someone means. It's caring more about understanding them than being understood yourself.
The danger isn't that AI lacks empathy, it's that we'll start outsourcing ours. We'll let it draft the condolence message, handle the customer complaint, respond to the hard conversation. And the more we do that, the less practice we get at the real thing.
Read More
Fethi Uluak Fethi Uluak

When Everyone’s a Creator

AI can read a thousand conversations and still completely miss what it was like to actually be in one. 
The idea that everyone can create sounds amazing. Until you realize that when everyone is creating, most of it might just becomes noise.
The problem now isn't that people can't make things. It's that there's too much to sort through. We're drowning in content. What we actually need isn't more stuff, it's better judgment about what's worth paying attention to.
The people who matter creatively won't be the ones churning out the most work. They'll be the ones who can tell the difference between what's good and what's just there. The skill isn't generation anymore. It's curation. It's knowing what deserves to last and what should just scroll by.
Read More
Fethi Uluak Fethi Uluak

The Disappearing Line Between Thinking and Doing

We used to have to think things through before we could do them. Now we can just start. AI lets us build while we're still figuring it out. Every half-formed idea can become a prototype in minutes.
We used to have to think things through before we could do them. Now we can just start. AI lets us build while we're still figuring it out. Every half-formed idea can become a prototype in minutes.
That's exciting, but it's also risky. When execution is that easy, thinking starts to feel optional. Why sit with a problem when you can just generate five solutions and pick one?
But the whole point of technology isn't to let us think less. It's supposed to give us more time to think better. The people who get that will still slow down sometimes, not because they can't keep up, but because good judgment still takes longer to develop than quick execution.
Read More
Fethi Uluak Fethi Uluak

Beyond Efficiency

Every time technology changes, we tell two stories. One about what we're losing. One about what we might gain.
Every time technology changes, we tell two stories. One about what we're losing. One about what we might gain.
AI is already taking over a lot of routine work, the templated emails, the data entry, the meeting notes. And the optimistic take is that this frees us up for deeper work. More time for creativity, strategy, human connection. The stuff that actually matters.
I want to believe that. But I don't think it happens automatically.
Because here's what we've seen before: productivity tools were supposed to give us time back, but we usually just filled that time with more work. Email made communication faster, so we sent more emails. Spreadsheets made analysis easier, so we analyzed more things. Every efficiency gain tends to get absorbed into doing more, not thinking deeper.
So yes, AI could give us back human attention. But it depends on what we do with it. Whether organizations see automation as a chance to do better work, or just a way to do more work with fewer people. Whether we use the freed-up time to actually focus, or just pack in another meeting.
AI isn't the end of human contribution. It's an opening. What happens next is still up to us.
Read More