This is an old revision of the document!


VobSubs are subtitles on DVDs, which are pictures overlaid on the video image.

Extract VOBSUBs using mencoder. Will create dvd.idx and dvd.sub.

mencoder dvd://1 -ovc copy -oac copy -vobsubout dvd -vobsuboutindex 0 -sid 0 -o /dev/null

Merge them all into a Matroska file:

mkvmerge -o dvd.mkv dvd.mp4 dvd.idx dvd.sub

Detecting VobSubs

Note that ffmpeg and libav v 0.8.* will see the vobsubs with ffprobe or avprobe, but any other higher version of avprobe (libav) will not.

$ ffprobe dvd_track_02.vob
ffprobe version 3.3.3 Copyright (c) 2007-2017 the FFmpeg developers
  built with gcc 4.9.4 (Gentoo 4.9.4 p1.0, pie-0.6.4)
  configuration: --prefix=/usr/local/ffmpeg
  libavutil      55. 58.100 / 55. 58.100
  libavcodec     57. 89.100 / 57. 89.100
  libavformat    57. 71.100 / 57. 71.100
  libavdevice    57.  6.100 / 57.  6.100
  libavfilter     6. 82.100 /  6. 82.100
  libswscale      4.  6.100 /  4.  6.100
  libswresample   2.  7.100 /  2.  7.100
Input #0, mpeg, from 'dvd_track_02.vob':
  Duration: 00:06:29.73, start: 441.272633, bitrate: 5630 kb/s
    Stream #0:0[0x1bf]: Data: dvd_nav_packet
    Stream #0:1[0x1e0]: Video: mpeg2video (Main), yuv420p(tv, smpte170m, bottom first), 720x480 [SAR 8:9 DAR 4:3], Closed Captions, 29.97 fps, 59.94 tbr, 90k tbn, 59.94 tbc
    Stream #0:2[0x80]: Audio: ac3, 48000 Hz, mono, fltp, 192 kb/s

archives: VobSub notes

Converting VobSubs is really hard.

First, extract them using transcode and subtitle2pgm (subtitleripper package):

tcextract -x ps1 -t vob -a 0x20 -i ../DC_Reader.vob | subtitle2pgm -o english -c 

You can find the right color codes to use by playing with the options, to make OCR easier. See for more details.

I *vaguely* recall having issues with newer (>0.45) versions of gocr, but it could just have been that it didn't fare any better.

Use pgm2txt to use OCR on the image files:

pgm2txt english

If you did the color conversion right, it should find most of them itself.