====== VobSubs ====== VobSubs are subtitles on DVDs, which are pictures overlaid on the video image. Extract VOBSUBs using ''mencoder''. Will create ''dvd.idx'' and ''dvd.sub''. mencoder dvd://1 -ovc copy -oac copy -vobsubout dvd -vobsuboutindex 0 -sid 0 -o /dev/null Merge them all into a Matroska file: mkvmerge -o dvd.mkv dvd.mp4 dvd.idx dvd.sub === Detecting VobSubs === Note that ''ffmpeg'' and ''libav'' v 0.8.* will see the vobsubs (Stream #0.0) and closed captioning (part of MPEG2 video in Stream #0.1) with ''ffprobe'' or ''avprobe'', but any other higher version of ''avprobe'' (libav) will not. $ ffprobe dvd_track_02.vob ffprobe version 3.3.3 Copyright (c) 2007-2017 the FFmpeg developers built with gcc 4.9.4 (Gentoo 4.9.4 p1.0, pie-0.6.4) configuration: --prefix=/usr/local/ffmpeg libavutil 55. 58.100 / 55. 58.100 libavcodec 57. 89.100 / 57. 89.100 libavformat 57. 71.100 / 57. 71.100 libavdevice 57. 6.100 / 57. 6.100 libavfilter 6. 82.100 / 6. 82.100 libswscale 4. 6.100 / 4. 6.100 libswresample 2. 7.100 / 2. 7.100 Input #0, mpeg, from 'dvd_track_02.vob': Duration: 00:06:29.73, start: 441.272633, bitrate: 5630 kb/s Stream #0:0[0x1bf]: Data: dvd_nav_packet Stream #0:1[0x1e0]: Video: mpeg2video (Main), yuv420p(tv, smpte170m, bottom first), 720x480 [SAR 8:9 DAR 4:3], Closed Captions, 29.97 fps, 59.94 tbr, 90k tbn, 59.94 tbc Stream #0:2[0x80]: Audio: ac3, 48000 Hz, mono, fltp, 192 kb/s === archives: VobSub notes === Converting VobSubs is really hard. First, extract them using transcode and subtitle2pgm (subtitleripper package): tcextract -x ps1 -t vob -a 0x20 -i ../DC_Reader.vob | subtitle2pgm -o english -c 255,0,255,255 You can find the right color codes to use by playing with the options, to make OCR easier. See http://www.bunkus.org/dvdripping4linux/en/separate/subtitles.html#subtitles for more details. I *vaguely* recall having issues with newer (>0.45) versions of gocr, but it could just have been that it didn't fare any better. Use pgm2txt to use OCR on the image files: pgm2txt english If you did the color conversion right, it should find most of them itself.