[linux-dvb] how to extract subtitles in text format?

Eric Fernandez egf05 at doc.ic.ac.uk
Tue Aug 1 16:07:06 CEST 2006


Nick Rolfe wrote:
>> I have bought a DVB-T card and am able to use it with Kaffeine with no
>> problem. I can record the video, with the subtitles. However, I would
>> like to know if it is possible to:
>> - extract these subtitles as text (subtitleripper?)
>> - get a timestamp in seconds for each line of subtitle, as text too.
>
> Have a look at son2srt <http://www.cs.helsinki.fi/u/mikkila/son2srt/>,
> there's a better explanation than I could give there.
>
> (Incidentally, the author talks about taking input from a subtitle
> file created by ProjectX. ProjectX is the only tool I've found to
> reliably result in synchronised audio/video when used in the
> transcoding process - it copes much better with stream errors).
>
> A friend of mine has modified it and built up a custom symbol database
> for the font they use in UK DVB-T broadcasts, but it's still a WIP
> (and he's on holiday right now). It works pretty reliably for most
> text but often gets caught out by punctuation.
>
> So yes, it is possible, but atm it may require quite a bit of work on
> your part to get it working.
>
> -Nick
Thanks a lot for all these answers. Actually, gocr seems to work very 
well with BBC subtitles.

Eric



More information about the linux-dvb mailing list