This is an old revision of the document!
VobSubs
VobSubs are subtitles on DVDs, which are pictures overlaid on the video image.
Extract VOBSUBs using mencoder. Will create dvd.idx and dvd.sub.
mencoder dvd://1 -ovc copy -oac copy -vobsubout dvd -vobsuboutindex 0 -sid 0 -o /dev/null
Merge them all into a Matroska file:
mkvmerge -o dvd.mkv dvd.mp4 dvd.idx dvd.sub
Detecting VobSubs
Note that ffmpeg and libav v 0.8.* will see the vobsubs and closed captioning streams with ffprobe or avprobe, but any other higher version of avprobe (libav) will not.
$ ffprobe dvd_track_02.vob
ffprobe version 3.3.3 Copyright (c) 2007-2017 the FFmpeg developers
built with gcc 4.9.4 (Gentoo 4.9.4 p1.0, pie-0.6.4)
configuration: --prefix=/usr/local/ffmpeg
libavutil 55. 58.100 / 55. 58.100
libavcodec 57. 89.100 / 57. 89.100
libavformat 57. 71.100 / 57. 71.100
libavdevice 57. 6.100 / 57. 6.100
libavfilter 6. 82.100 / 6. 82.100
libswscale 4. 6.100 / 4. 6.100
libswresample 2. 7.100 / 2. 7.100
Input #0, mpeg, from 'dvd_track_02.vob':
Duration: 00:06:29.73, start: 441.272633, bitrate: 5630 kb/s
Stream #0:0[0x1bf]: Data: dvd_nav_packet
Stream #0:1[0x1e0]: Video: mpeg2video (Main), yuv420p(tv, smpte170m, bottom first), 720x480 [SAR 8:9 DAR 4:3], Closed Captions, 29.97 fps, 59.94 tbr, 90k tbn, 59.94 tbc
Stream #0:2[0x80]: Audio: ac3, 48000 Hz, mono, fltp, 192 kb/s
archives: VobSub notes
Converting VobSubs is really hard.
First, extract them using transcode and subtitle2pgm (subtitleripper package):
tcextract -x ps1 -t vob -a 0x20 -i ../DC_Reader.vob | subtitle2pgm -o english -c 255,0,255,255
You can find the right color codes to use by playing with the options, to make OCR easier. See http://www.bunkus.org/dvdripping4linux/en/separate/subtitles.html#subtitles for more details.
I *vaguely* recall having issues with newer (>0.45) versions of gocr, but it could just have been that it didn't fare any better.
Use pgm2txt to use OCR on the image files:
pgm2txt english
If you did the color conversion right, it should find most of them itself.