The Hypothesis
Enterprise websites are complex networks, not document collections. The position of a page in the link graph should predict its business function better than analyzing its content.
How network topology predicts business logic with 95% accuracy, and why simple models outperform deep learning for structural auditing.
Author: Imre Lóránt Dévai
Can we predict what a page does in an organization by analyzing where it sits in the network, without reading a single word of content?
Enterprise websites are complex networks, not document collections. The position of a page in the link graph should predict its business function better than analyzing its content.
Traditional approaches rely on content analysis: keywords, topics, metadata. But text features require extensive NLP processing and often miss structural relationships that define page purpose.
New pages have no network position yet. Zero links, no topology data. How do you classify content that exists outside the graph? This requires a different approach entirely.
We analyzed a large enterprise website, extracting the complete link graph and computing network features for every page.
Crawl every page and map all internal links to build the complete network topology.
Calculate topological metrics: degree, PageRank, betweenness centrality, clustering coefficient, community membership.
Apply the Louvain algorithm to find natural clusters in the network structure.
Train classifiers to predict business function from network position alone.
Key insight: Pages that link together tend to serve similar business purposes. The network structure encodes organizational logic.
Discovering nodes...
Quantified results from analyzing enterprise website architecture using network science methods.
Network topology as primary signal
Network position alone predicts page business function with 95% accuracy. A page's degree, centrality, and cluster membership reveal its organizational role.
Structure is the dominant signal, not content
Network vs. text features
Network features outperformed text-based NLP features by 34.3 percentage points. Content analysis captures what a page says; topology reveals what it does.
Position in the graph matters more than keywords
Math matches the org chart
Communities detected by the Louvain algorithm had 94.2% homogeneity with actual business units. The mathematical structure mirrors organizational reality.
Network communities reflect real business divisions
We tested simple interpretable models against state-of-the-art deep learning. The results challenged conventional AI assumptions.
Traditional machine learning
A decision tree ensemble that can explain every prediction. Each classification traces back to specific features (degree, centrality, cluster membership) that humans can verify.
Deep learning approach
A Graph Neural Network that learns node embeddings through neighbor aggregation. State-of-the-art for many graph tasks, but predictions emerge from millions of parameters.
The simple model achieved higher accuracy than the deep learning approach while remaining fully interpretable. More complexity doesn't always mean better performance.
Practical implications of the research for enterprise website governance.
Traditional SEO and content analysis miss the bigger picture. The network itself (how pages connect, cluster, and flow) is the primary signal for understanding organizational structure.
For governance decisions, understanding why matters as much as what. Transparent models enable audit trails, compliance documentation, and stakeholder trust that black-box AI cannot provide.
Deep learning models offer no advantage for this domain while adding infrastructure costs, training time, and explainability debt. The right tool is the simplest one that works.
New content without network context can still be classified effectively using text features. The dual-track approach handles both established pages and fresh content with enterprise-grade accuracy.
One unexpected finding: adding metadata features (readability scores, layout counts, word statistics) to a strong network model actually reduced accuracy from 92.5% to 91.9%. More signals introduced noise rather than clarity. This reinforces the efficiency principle: focus on what matters, ignore the rest.
Let's discuss how network science can improve your content governance.
Get in Touch