Case study - Nuestros Tiempos — Decoding Chile's News Complexity with AI

Nuestros Tiempos is an AI-powered news aggregation platform designed to analyze Chilean news narratives. It explores how unsupervised learning and vector embeddings can uncover patterns across thousands of news articles — automatically, without supervision.

www.nuestrostiempos.cl

Client: Codisans
Year: 2025
Service: Automation, Artificial Intelligence

Tech Stack Highlights

Laravel
React
LibSQL + SQLite
Model2Vec
UMAP + HDBSCAN

Overview

Nuestros Tiempos is an AI-powered news aggregation platform designed to analyze Chilean news narratives. Built by Codisans, it explores how unsupervised learning and vector embeddings can uncover patterns across thousands of news articles — automatically, without supervision, and on a server that costs just $6/month.

The Problem: Navigating Information & Misinformation

In today's information landscape, people are overwhelmed by volume and bias.

Chileans, like audiences everywhere, face an invisible problem: news fragmentation.

Each outlet presents partial narratives, making it hard to understand the complete picture.

Nuestros Tiempos tackles this by automatically identifying different narratives around the same news event, allowing users to explore multiple perspectives in one place — a key step toward reducing misinformation and polarization.

Objectives

Develop a fully automated news clustering system to group semantically similar articles.
Achieve high-quality clustering with zero supervision and minimal resources.
Run the entire system on a tiny VPS for less than $6/month.
Build a modern, responsive UX that presents AI insights clearly to users.

Vision

"Decoding Chile's News Complexity with AI"

Nuestros Tiempos represents Codisans' vision for applied AI: accessible, ethical, and efficient.

Rather than relying on large, expensive LLM pipelines, this project proves how traditional yet powerful ML techniques — when combined creatively — can deliver meaningful impact.

Machine Learning Models & Approach

Step	Technique	Purpose
1. Embeddings	Model2Vec (static embeddings)	Generate semantic representations of news articles.
2. Dimensionality Reduction	UMAP	Compress embeddings while preserving structure.
3. Clustering	HDBSCAN	Discover dense clusters representing thematic narratives.
4. Summarization	Transformers.js	Generate human-readable summaries of each cluster, directly on the user's browser.

All models run on CPU — no GPU dependency — showing that ML creativity > hardware power.

Why These Technologies?

Model2Vec: lightweight, deterministic, and fast for static embeddings.
UMAP: maintains semantic proximity in lower dimensions, critical for clarity.
HDBSCAN: automatically detects cluster density — perfect for uneven, real-world data like news.
Transformers.js: integrates client-side summarization for explainable AI.

Together, these form an elegant, resource-efficient unsupervised NLP pipeline.

Pipeline

Data Cleaning & Normalization – Text extracted from 16 major Chilean outlets (Emol, CNN Chile, Cooperativa, T13, etc.)
Vectorization – Each article is converted into a 768-dimensional vector using a static model distilled from Google DeepMind's EmbeddingGemma with Model2vec.
Dimensionality Reduction – Reduced using UMAP to reveal meaningful topological relationships.
Clustering – Applied HDBSCAN to detect natural groupings without preset parameters.
Filtering & Validation – Outliers removed, clusters refined.
Collection Generation – Articles grouped into readable "collections" of narratives.

All data is stored in SQLite and LibSQL (for vectors), using native SQLite FTS5 for full-text search — no external search services needed.

Results

100% automated pipeline
No manual labeling or supervision
Human-readable, coherent clusters
Native full-text search without extra services
Deployed on a small VPS with a cost of $6/month

Challenges & Solutions

Challenge	Codisans' Solution
Resource constraints	Smart use of static embeddings + UMAP + HDBSCAN on CPU
Cluster accuracy	Fine-tuned HDBSCAN parameters (min_samples, min_cluster_size)
Search relevance	Native SQLite FTS5 implementation for scalable local search
UX complexity	Minimalist UI with dynamic clustering visualization

Conclusions & Learnings

Even static embedding models can produce surprisingly strong results when combined thoughtfully.
UMAP and HDBSCAN parameters significantly affect cluster quality — hyperparameter tuning is key.
Clean input data dramatically improves unsupervised outcomes.
Minimalist architectures can achieve maximum insight per dollar.

Nuestros Tiempos is a testament to Codisans' philosophy:
AI should be useful, understandable, and efficient — not just impressive.

What could we achieve with more resources?

Sentence-transformers for contextual embeddings
Transformer-based summaries per cluster
Larger-scale reclustering
Quality assurance via transformer-based QA models

This could elevate Nuestros Tiempos from a research project to a full-fledged media intelligence platform.

Email us