Reducing Android Logs for LLMs: Semantic Compression

Introduction

If you're doing Android development with Claude Code or Cursor, you'll hit this wall fast:

Raw logcat dumps can be thousands of lines
A UIAutomator XML for a single screen can be hundreds of KB
adb dumpsys output can run to tens of thousands of lines

If all you wanted was to reduce token count, regular gzip would be enough. But data sent to LLMs needs "semantic compression" — the goals are fundamentally different.

	Regular compression	Semantic compression for LLMs
Goal	Reduce bytes	Reduce reasoning noise
Criterion	Reproducibility	AI comprehensibility
Output	Equivalent to original	Semantically equivalent

Multi-Stage Pipeline Architecture

The basic architecture for semantic compression looks like this:

input
  ↓ parser           (structuring)
  ↓ normalizer       (normalize formatting variations)
  ↓ dedupe           (remove duplicates)
  ↓ semantic reduction (remove irrelevant information)
  ↓ structuring      (organize)
  ↓ encoder          (TOON/DSL encoding)
  ↓ LLM

Let's walk through each step.

1. Parser — Structuring First

The first job is structuring the raw input.

Logcat example:

05-28 10:00:00 E MyApp: NullPointerException

↓

{
  "time": "05-28 10:00:00",
  "level": "E",
  "tag": "MyApp",
  "message": "NullPointerException"
}

Structuring enables filtering in downstream stages.

2. Normalizer — Eliminating Formatting Variations

Unify information that conveys the same meaning in different formats.

Shortening package names:

com.example.myapp.feature.login.LoginViewModel

↓

LoginViewModel

Normalizing exception class names:

java.lang.NullPointerException

↓

NPE

3. Dedupe — Removing Duplicates

Repetition is almost always harmful to LLMs.

Input:

Loading...
Loading...
Loading...
Loading...

After compression:

Loading... x4

A more significant example — the same stack trace repeated 100 times:

EXCEPTION repeated x100 { type=NPE source=LoginViewModel.kt line=42 }

This alone saves enormous numbers of tokens.

4. Semantic Reduction — Dropping What AI Doesn't Need

This is the core of LLM compression.

Android-Specific Noise Logs to Exclude

These don't contribute to AI debugging:

Tag	Content
`BufferQueue`	Graphics system internals
`OpenGLRenderer`	Rendering engine
`libEGL`	OpenGL ES
`AudioTrack`	Audio internals
`TrafficStats`	Network statistics
`chatty`	Log suppression messages
`GC_`	Garbage collection

Filter example:

NOISE_PATTERNS = [
    r"BufferQueue",
    r"OpenGLRenderer",
    r"libEGL",
    r"AudioTrack",
    r"chatty",
    r"GC_",
]

def filter_noise(line: str) -> bool:
    return not any(re.search(p, line) for p in NOISE_PATTERNS)

Important Logs to Keep

Exception / ANR / Activity lifecycle / Network error / Firebase / WorkManager

5. Structuring — Organizing for LLM Comprehension

LLMs handle organized information better.

Before:

MainActivity created
Fragment onResume called
API request started

After:

ACTIVITY{ name=MainActivity state=created }
FRAGMENT{ state=onResume }
API{ state=start }

Explicit labels significantly improve AI comprehension.

6. Encoder — TOON/DSL Encoding

Finally, convert to a custom format (TOON):

events[3]{type,target,state}:
  ACT|MainActivity|created
  API|LoginApi|start
  ERR|Auth|401

More information per line, more content fitting within the context window.

UIAutomator and UI Tree Compression

When using Android MCP, UIAutomator XML is even more verbose.

Original XML (excerpt):

{
  "x": 120,
  "y": 400,
  "width": 200,
  "height": 48,
  "padding": 0,
  "alpha": 1.0,
  "focusable": false,
  "clickable": true,
  "text": "Login"
}

After compression:

BTN[text=Login]

Padding and alpha are essentially useless to an LLM. Keeping only elements where clickable=true and text is present reduces the same information to under 1/20 the token count.

Domain Specialization Matters Most

Android-specific compression logic outperforms generic compression by a wide margin.

Automatic Log Category Classification

CATEGORIES = {
    "lifecycle":   ["onCreate", "onResume", "onPause", "onDestroy"],
    "crash":       ["Exception", "ANR", "Fatal"],
    "network":     ["Retrofit", "OkHttp", "HttpException"],
    "database":    ["Room", "SQLite"],
    "compose":     ["Recomposition", "Composition"],
    "firebase":    ["Firebase", "Firestore", "FCM"],
    "workmanager": ["WorkManager", "Worker"],
}

Passing logs organized by category to AI groups related information together and improves debugging accuracy.

Priority Scoring for Context Control

priority=10  EXCEPTION / ANR
priority=8   Network error
priority=5   Lifecycle event
priority=3   Debug log
priority=1   Verbose

When context is limited, fill from highest priority first.

Command-Based Interface by Use Case

Slash commands create a natural interface for AI integration:

/logcat     → logcat-specific compression
/uiauto     → UIAutomator XML-specific compression
/json       → large JSON-specific compression
/stacktrace → extract stacktrace only

Example:

/logcat
05-28 10:00:00 D MyApp: Loading
05-28 10:00:01 D MyApp: Loading
05-28 10:00:02 E MyApp: NullPointerException at LoginViewModel.kt:42

↓

LOG{
  repeated[ "Loading" x2 ]
  error{ type=NPE source=LoginViewModel.kt line=42 }
}

Going Further: MCP Server Implementation

This pipeline can be implemented as an MCP server:

@mcp.tool()
def compress_logcat(text: str) -> str:
    return run_pipeline(text, mode="logcat")

@mcp.tool()
def compress_uiauto(xml: str) -> str:
    return run_pipeline(xml, mode="uiauto")

@mcp.tool()
def compress_json(data: str) -> str:
    return run_pipeline(data, mode="json")

Claude Desktop and Claude Code can then call these tools automatically. The ideal is compression that happens transparently, without the user needing to think about it.

Summary

Step	Effect
Parser	Enables downstream filtering
Normalizer	Eliminates duplicate representations of the same information
Dedupe	Collapses repeated log entries
Semantic Reduction	Removes noise irrelevant to AI
Structuring	Organizes data for better AI comprehension
Encoder (TOON)	Increases information density

The key insight is knowing what AI doesn't need. Regular compression reduces bytes. Semantic compression for LLMs reduces reasoning noise. Keeping that distinction in mind makes the Android MCP + AI combination dramatically more powerful.