Skip to content
Appendix / File formats and limits
Open app

File formats and limits

This page is reference material. Use it when you need to know whether a file type is parsed, previewed, blocked, or subject to a practical product limit.

Parsing means Colabra extracts text, tables, or structured content for downstream analysis. Preview means the file can be viewed directly in the browser.

Supported file formats

Parsed and previewed in the app

FormatNotes
Email threads (.eml)Parsed as communication threads and previewed in the Communication thread tab
PDF (.pdf)Parsed and previewed in the PDF viewer
Word documents (.doc, .docx)Parsed and previewed in the Word document viewer
Plain text and Markdown (.txt, .text, .md)Parsed and previewed as text
Delimited spreadsheets (.csv, .tsv)Parsed and previewed in the spreadsheet viewer
Excel workbooks (.xls, .xlsx)Parsed and previewed in the workbook viewer
Previewable code and config files.c, .cpp, .css, .env, .f, .f90, .f95, .go, .htaccess, .html, .htm, .java, .json, .log, .m, .ovpn, .php, .py, .r, .rmd, .rs, .sh, .ts, .tsx, .xml, .yml, .yaml, .conf
Parsed image files.apng, .avif, .bmp, .cur, .dcx, .ftex, .gif, .heic, .heif, .jpeg, .jpg, .pcx, .pix, .png, .ppm, .psd, .tif, .tiff

Parsed but not previewed in-browser

FormatNotes
Word templates (.dotx)Parsed, but no dedicated in-browser preview block
Rich text (.rtf)Parsed, but no dedicated in-browser preview block
WordPerfect (.wpd)Parsed, but no dedicated in-browser preview block
Excel macro and template files (.xlsm, .xltx, .xltm)Parsed after workbook preprocessing, but not previewed in-browser
Quattro Pro (.qpw)Parsed, but not previewed in-browser
PowerPoint (.ppt, .pptx)Parsed, but not previewed in-browser
MIME-driven text formatsAlso parsed when uploaded with supported text MIME types, including C#, Kotlin, Scala, Swift, SQL, Perl, NDJSON, JSON-LD, CycloneDX JSON, SPDX JSON, RSS/Atom XML, and SubRip

Previewed but not parsed for extraction

FormatNotes
Display-only images.svg, .webp, .oif preview in-browser but do not currently run through the extraction pipeline
Video.avi, .flv, .m4v, .mkv, .mov, .mp4, .webm, .wmv preview in-browser only
Audio.aac, .flac, .m4a, .mp3, .ogg, .opus, .wav preview in-browser only

Recognized in file typing but not currently first-class for parsing or preview

  • .pages, .ods, .numbers, and .pptm
  • archive files like .zip, .rar, .7zip, .tar, .tar.gz
  • sequence, molecule, and analysis families such as .fasta, .fastq, .bam, .sam, .nd2, .czi, .pdb, .sdf, .mol, .pzfx, .opj, and related specialist formats

Practical limits

LimitWhat happens
Upload sizeStandard upload validation uses a 100 MB per-file limit by default. Files above the bucket limit are rejected before upload.
Parsing file sizeBackend parsing also defaults to a 100 MB maximum file size. Larger files can upload in some contexts but will not enter the extraction pipeline.
Parsing page countDocument parsing defaults to a maximum of 250 pages per file.
Parsing row countSpreadsheet and delimited-file parsing defaults to a maximum of 250,000 rows per file.
General browser preview sizePreview is disabled once a file is above 50 MB in the browser preview flow.
Text previewLarge text previews are truncated. Fullscreen text preview is capped at roughly 500,000 characters; inline text preview uses a much smaller window.
CSV / TSV previewDelimited previews show up to 2,000 rows in fullscreen and 5 rows inline.
Excel workbook safety limitWorkbook preview is disabled above 120 sheets, 150,000 rows, or 3,000,000 cells total.
Excel archive safety limitWorkbook preview is also blocked when the .xlsx archive expands beyond 20 MB uncompressed, 1,024 ZIP entries, 8 MB largest worksheet XML, or 5 MB shared strings XML.
Excel worksheet render windowEven when workbook preview is allowed, the worksheet renderer caps the visible grid at 5,000 rows and 200 columns at a time.
Inline Word previewNon-fullscreen Word preview is intentionally shallow and shows only the first page before the user opens the full preview.

If a file crosses one of these limits, the right operating move is usually to split it into reviewable parts rather than force one oversized upload or preview through the product.