Research Paper — 2025

Building an AI Governance System for Enterprise Websites

How network topology predicts business logic with 95% accuracy, and why simple models outperform deep learning for structural auditing.

Author: Imre Lóránt Dévai

See Key Findings Download Paper

The research question

Can we predict what a page does in an organization by analyzing where it sits in the network, without reading a single word of content?

The Hypothesis

Enterprise websites are complex networks, not document collections. The position of a page in the link graph should predict its business function better than analyzing its content.

Research finding:

Network position predicts business function with 95% accuracy

The Challenge

Traditional approaches rely on content analysis: keywords, topics, metadata. But text features require extensive NLP processing and often miss structural relationships that define page purpose.

Research finding:

Network features outperform text by 34.3 percentage points

The Cold-Start Problem

New pages have no network position yet. Zero links, no topology data. How do you classify content that exists outside the graph? This requires a different approach entirely.

Research finding:

Text-based classification achieves 92% accuracy for new pages

How we tested it

We analyzed a large enterprise website, extracting the complete link graph and computing network features for every page.

Extract the graph

Crawl every page and map all internal links to build the complete network topology.

Compute features

Calculate topological metrics: degree, PageRank, betweenness centrality, clustering coefficient, community membership.

Detect communities

Apply the Louvain algorithm to find natural clusters in the network structure.

Predict function

Train classifiers to predict business function from network position alone.

Key insight: Pages that link together tend to serve similar business purposes. The network structure encodes organizational logic.

Discovering nodes...

Key findings

Quantified results from analyzing enterprise website architecture using network science methods.

95%

Prediction Accuracy

Network topology as primary signal

Network position alone predicts page business function with 95% accuracy. A page's degree, centrality, and cluster membership reveal its organizational role.

Structure is the dominant signal, not content

34.3%

Performance Gap

Network vs. text features

Network features outperformed text-based NLP features by 34.3 percentage points. Content analysis captures what a page says; topology reveals what it does.

Position in the graph matters more than keywords

94.2%

Community Alignment

Math matches the org chart

Communities detected by the Louvain algorithm had 94.2% homogeneity with actual business units. The mathematical structure mirrors organizational reality.

Network communities reflect real business divisions

The efficiency principle

We tested simple interpretable models against state-of-the-art deep learning. The results challenged conventional AI assumptions.

Simple & Interpretable

Random Forest

Traditional machine learning

95.3%

accuracy

A decision tree ensemble that can explain every prediction. Each classification traces back to specific features (degree, centrality, cluster membership) that humans can verify.

Every decision is traceable and auditable
Fast training and inference (<100ms)
No GPU required, runs on standard hardware
Model can be inspected and validated

Complex & Opaque

GraphSAGE (GNN)

Deep learning approach

94.9%

accuracy

A Graph Neural Network that learns node embeddings through neighbor aggregation. State-of-the-art for many graph tasks, but predictions emerge from millions of parameters.

Predictions are difficult to explain
Requires specialized GPU infrastructure
Longer training cycles and higher costs
Model internals resist human inspection

The counterintuitive result

The simple model achieved higher accuracy than the deep learning approach while remaining fully interpretable. More complexity doesn't always mean better performance.

+0.4%

RF advantage

What this means

Practical implications of the research for enterprise website governance.

Structure Over Content

Traditional SEO and content analysis miss the bigger picture. The network itself (how pages connect, cluster, and flow) is the primary signal for understanding organizational structure.

Interpretability Matters

For governance decisions, understanding why matters as much as what. Transparent models enable audit trails, compliance documentation, and stakeholder trust that black-box AI cannot provide.

Complexity Has Costs

Deep learning models offer no advantage for this domain while adding infrastructure costs, training time, and explainability debt. The right tool is the simplest one that works.

Cold-Start is Solvable

New content without network context can still be classified effectively using text features. The dual-track approach handles both established pages and fresh content with enterprise-grade accuracy.

The metadata paradox

One unexpected finding: adding metadata features (readability scores, layout counts, word statistics) to a strong network model actually reduced accuracy from 92.5% to 91.9%. More signals introduced noise rather than clarity. This reinforces the efficiency principle: focus on what matters, ignore the rest.

Ready to apply this research to your architecture?

Let's discuss how network science can improve your content governance.

Get in Touch