Text Cleaner
Remove extra whitespace, line breaks, HTML tags, and invisible Unicode — clean text instantly
Input Text
Cleaning Options
Cleaned Text
You paste text from a PDF and every line has a hard line break in the middle of sentences. Or you copy from a website and get invisible Unicode characters — zero-width spaces, soft hyphens, non-breaking spaces — that break string comparisons. Or a CSV export has tabs mixed with spaces. This tool strips all of that in one click.
Why This Tool
Text from different sources carries different invisible baggage. PDFs insert line breaks at column edges. Word processors add smart quotes and em dashes. Websites embed zero-width joiners and non-breaking spaces. Each of these causes subtle bugs in code, data processing, and content management. This tool gives you toggleable cleaning operations so you can strip exactly what you need.
Cleaning Operations
Remove Extra Whitespace
Collapses multiple consecutive spaces into a single space and trims leading/trailing whitespace from each line. Turns "Hello world" into "Hello world".
Remove Line Breaks
Joins lines that were artificially split (common in PDF copy-paste). Preserves paragraph breaks (double newlines) while removing single line breaks within paragraphs.
Remove HTML Tags
Strips all HTML tags, leaving only the text content. <p>Hello <strong>world</strong></p> becomes Hello world.
Remove Special Characters
Strips non-alphanumeric characters (punctuation, symbols) while preserving spaces and basic structure.
Normalize Unicode
Replaces fancy Unicode characters with ASCII equivalents:
- Smart quotes (
""'') → straight quotes ("') - Em dash (
—) → double hyphen (--) - Ellipsis (
…) → three dots (...) - Non-breaking space → regular space
- Zero-width characters → removed entirely
Trim Lines
Removes leading and trailing whitespace from every line independently.
Common Use Cases
- PDF to clean text: Remove artificial line breaks and extra spaces
- Web scraping cleanup: Strip HTML tags and normalize whitespace
- Data normalization: Clean CSV/TSV fields before importing
- Code comments: Remove fancy Unicode from copy-pasted text
- Email formatting: Fix text copied from rich-text email clients
Invisible Unicode Characters
These characters are invisible but cause real problems:
| Character | Unicode | Problem |
|---|---|---|
| Zero-width space | U+200B | Breaks string equality checks |
| Zero-width joiner | U+200D | Appears in copy-paste from web |
| Non-breaking space | U+00A0 | Looks like a space but isn’t |
| Soft hyphen | U+00AD | Invisible except at line breaks |
| BOM (Byte Order Mark) | U+FEFF | Causes “unexpected token” errors in parsers |
This tool detects and removes all of these.
Frequently Asked Questions
Will this tool change my text content? Only whitespace, formatting characters, and invisible Unicode are affected. The actual words and numbers in your text remain unchanged. You can toggle each cleaning operation independently to control exactly what gets removed.
Can I clean code with this tool? Be careful — removing special characters will strip operators and syntax. Use only the whitespace normalization and Unicode cleaning options for code. The HTML tag removal option is safe for stripping markup from code snippets.
How does it handle different line ending formats? The tool normalizes all line endings (Windows CRLF, Mac CR, Unix LF) to Unix LF format. This prevents line ending mismatches when moving text between operating systems.
Does this tool preserve paragraph breaks? Yes, when using “Remove Line Breaks” mode. Single newlines (mid-paragraph breaks) are removed, but double newlines (paragraph separators) are preserved.