SLEAP stores predictions in image (top-left) coordinates; the reader
reflects y so the returned aniframe is in the conventional
bottom_left origin. SLEAP's analysis h5 export does not include
the source video resolution, so pass video_height to get an
accurate flip — otherwise max(y) is used as a fallback.