====== VobSubs ======
VobSubs are subtitles on DVDs, which are pictures overlaid on the video image.
Extract VOBSUBs using ''mencoder''. Will create ''dvd.idx'' and ''dvd.sub''.
mencoder dvd://1 -ovc copy -oac copy -vobsubout dvd -vobsuboutindex 0 -sid 0 -o /dev/null
Merge them all into a Matroska file:
mkvmerge -o dvd.mkv dvd.mp4 dvd.idx dvd.sub
=== Detecting VobSubs ===
Note that ''ffmpeg'' and ''libav'' v 0.8.* will see the vobsubs (Stream #0.0) and closed captioning (part of MPEG2 video in Stream #0.1) with ''ffprobe'' or ''avprobe'', but any other higher version of ''avprobe'' (libav) will not.
$ ffprobe dvd_track_02.vob
ffprobe version 3.3.3 Copyright (c) 2007-2017 the FFmpeg developers
built with gcc 4.9.4 (Gentoo 4.9.4 p1.0, pie-0.6.4)
configuration: --prefix=/usr/local/ffmpeg
libavutil 55. 58.100 / 55. 58.100
libavcodec 57. 89.100 / 57. 89.100
libavformat 57. 71.100 / 57. 71.100
libavdevice 57. 6.100 / 57. 6.100
libavfilter 6. 82.100 / 6. 82.100
libswscale 4. 6.100 / 4. 6.100
libswresample 2. 7.100 / 2. 7.100
Input #0, mpeg, from 'dvd_track_02.vob':
Duration: 00:06:29.73, start: 441.272633, bitrate: 5630 kb/s
Stream #0:0[0x1bf]: Data: dvd_nav_packet
Stream #0:1[0x1e0]: Video: mpeg2video (Main), yuv420p(tv, smpte170m, bottom first), 720x480 [SAR 8:9 DAR 4:3], Closed Captions, 29.97 fps, 59.94 tbr, 90k tbn, 59.94 tbc
Stream #0:2[0x80]: Audio: ac3, 48000 Hz, mono, fltp, 192 kb/s
=== archives: VobSub notes ===
Converting VobSubs is really hard.
First, extract them using transcode and subtitle2pgm (subtitleripper package):
tcextract -x ps1 -t vob -a 0x20 -i ../DC_Reader.vob | subtitle2pgm -o english -c
255,0,255,255
You can find the right color codes to use by playing with the options, to make OCR easier.
See http://www.bunkus.org/dvdripping4linux/en/separate/subtitles.html#subtitles for more
details.
I *vaguely* recall having issues with newer (>0.45) versions of gocr, but it could just have
been that it didn't fare any better.
Use pgm2txt to use OCR on the image files:
pgm2txt english
If you did the color conversion right, it should find most of them itself.