Zur Hauptnavigation / To main navigation

Zur Sekundärnavigation / To secondary navigation

Zum Inhalt dieser Seite / To the content of this page

Hauptnavigation / Main Navigation

Sekundärnavigation / Secondary navigation

Inhaltsbereich / Content

Lecture Document and Content Analysis

Content of the Lecture

  • document formats and standards (TIFF, JPEG, PDF, PostScript, SVG)
  • document image compression (G4, MRC, token based compression, JPEG2000)
  • logical markup (HTML, XML, word processing formats, DocBook)
  • writings systems of the world
  • character sets and character encodings (ASCII, Unicode, special coding systems)
  • text rendering, layout, ligatures, and hyphenation (Pango)
  • typesetting and page layout systems (text flow, Word, LaTeX, etc.)
  • OCR (character recognition, page segmentation)
  • spelling and orthographic variation, statistical language modeling
  • document capture, page image dewarping and handheld document capture)
  • named entity recognition, information extraction, table recognition
  • document search and retrieval, text mining, document databases
  • reading, psychophysics, and human-document interaction
  • document security and forensics

Für mehr Informationen, siehe die Seite im KIS.