1. 至少从iPhone4开始,苹果就是支持硬件解码了,但是硬解码API一直是私有API,不开放给开发者使用,只有越狱才能使用,正常的App如果想提交到AppStore是不允许使用私有API的。 2. 从iOS8开始,可能是苹果想通了,开放了硬解码和硬编码API,就是名为 VideoToolbox.framework的API,需要用iOS 8以后才能使用,iOS 7.x上还不行。
这套硬解码API是几个纯C函数,在任何OC或者 C++代码里都可以使用。 首先要把 VideoToolbox.framework 添加到工程里,并且包含以下头文件。
#include <VideoToolbox/VideoToolbox.h>
解码主要需要以下三个函数
VTDecompressionSessionCreate 创建解码 session
VTDecompressionSessionDecodeFrame 解码一个frame
VTDecompressionSessionInvalidate 销毁解码 session
首先要创建 decode session,方法如下:
OSStatus status = VTDecompressionSessionCreate(kCFAllocatorDefault,
decoderFormatDescription,
NULL, attrs,
&callBackRecord,
&deocderSession);
其中 decoderFormatDescription 是 CMVideoFormatDescriptionRef 类型的视频格式描述,这个需要用H.264的 sps 和 pps数据来创建,调用以下函数创建 decoderFormatDescription
CMVideoFormatDescriptionCreateFromH264ParameterSets
需要注意的是,这里用的 sps和pps数据是不包含“00 00 00 01”的start code的。
attr是传递给decode session的属性词典
CFDictionaryRef attrs = NULL;
const void *keys[] = { kCVPixelBufferPixelFormatTypeKey };
// kCVPixelFormatType_420YpCbCr8Planar is YUV420
// kCVPixelFormatType_420YpCbCr8BiPlanarFullRange is NV12
uint32_t v = kCVPixelFormatType_420YpCbCr8BiPlanarFullRange;
const void *values[] = { CFNumberCreate(NULL, kCFNumberSInt32Type, &v) };
attrs = CFDictionaryCreate(NULL, keys, values, 1, NULL, NULL);
其中重要的属性就一个,kCVPixelBufferPixelFormatTypeKey,指定解码后的图像格式,必须指定成NV12,苹果的硬解码器只支持NV12。
callBackRecord 是用来指定回调函数的,解码器支持异步模式,解码后会调用这里的回调函数。
如果 decoderSession创建成功就可以开始解码了。
VTDecodeFrameFlags flags = 0;
//kVTDecodeFrame_EnableTemporalProcessing | kVTDecodeFrame_EnableAsynchronousDecompression;
VTDecodeInfoFlags flagOut = 0;
CVPixelBufferRef outputPixelBuffer = NULL;
OSStatus decodeStatus = VTDecompressionSessionDecodeFrame(deocderSession,
sampleBuffer,
flags,
&outputPixelBuffer,
&flagOut);
其中 flags 用0 表示使用同步解码,这样比较简单。
其中 sampleBuffer是输入的H.264视频数据,每次输入一个frame。
先用CMBlockBufferCreateWithMemoryBlock 从H.264数据创建一个CMBlockBufferRef实例。
然后用 CMSampleBufferCreateReady创建CMSampleBufferRef实例。
这里要注意的是,传入的H.264数据需要Mp4风格的,就是开始的四个字节是数据的长度而不是“00 00 00 01”的start code,四个字节的长度是big-endian的。
一般来说从 视频里读出的数据都是 “00 00 00 01”开头的,这里需要自己转换下。
解码成功之后,outputPixelBuffer里就是一帧 NV12格式的YUV图像了。
如果想获取YUV的数据可以通过
CVPixelBufferLockBaseAddress(outputPixelBuffer, 0);
void *baseAddress = CVPixelBufferGetBaseAddress(outputPixelBuffer);
获得图像数据的指针,需要说明baseAddress并不是指向YUV数据,而是指向一个CVPlanarPixelBufferInfo_YCbCrBiPlanar结构体,结构体里记录了两个plane的offset和pitch。
但是如果想把视频播放出来是不需要去读取YUV数据的,因为CVPixelBufferRef是可以直接转换成OpenGL的Texture或者UIImage的。
调用CVOpenGLESTextureCacheCreateTextureFromImage,可以直接创建OpenGL Texture
从 CVPixelBufferRef 创建 UIImage
CIImage *ciImage = [CIImage imageWithCVPixelBuffer:pixelBuffer];
UIImage *uiImage = [UIImage imageWithCIImage:ciImage];
解码完成后销毁 decoder session
VTDecompressionSessionInvalidate(deocderSession)
硬解码的基本流程就是这样了,如果需要成功解码播放视频还需要一些H.264视频格式,YUV图像格式,OpenGL等基础知识。
还是有很多小细节要处理的,无法在这里一一说明了,有人有问题可以在评论里讨论。
从解码到播放,大约1000行代码左右,主要是OpenGL渲染的代码比较多。
苹果官方的示例代码:
WWDC - Apple Developer
苹果的例子下载链接实效了,我也找不到那个例子,我自己写了一个。
stevenyao/iOSHardwareDecoder · GitHub
3. 具体流程可以参考github上的一个例子,https://github.com/adison/-VideoToolboxDemo 注意create session和decode session时codecCtx和packet的数据可能会有一些小差别,可以根据具体情况修改
但是对于不同的嵌入式编码版 有的编码器的264码流可以解码成功,有的编码器死活解不了 不知道是否苹果官方存在这方面的限制
4.google到VLC的一个videotoolbox的patch
[PATCH 1/2] video chroma: add a Nv12 copy function which outputs I420 [PATCH 2/2] Add VideoToolbox based decoder 但是找了个mp4封装的视频,用VLCKit跑了一下,CPU变化不大,反而FPS降了一半,效果比软解码还差一大截,不清楚是哪个环节没搞对。。。
Concepts:NALUs: NALUs are simply a chunk of data of varying length that has a NALU start code header Parameters: Your decoder needs parameters so it knows how the H.264 video data is stored. The 2 you need to set are Sequence Parameter Set (SPS) and Picture Parameter Set (PPS) H.264 Stream Format: Procedure:
Other notes:
Code Example:So let's start by declaring some global variables and including the VT framework (VT = Video Toolbox).
The following array is only used so that you can print out what type of NALU frame you are receiving. If you know what all these types mean, good for you, you know more about H.264 than me :) My code only handles types 1, 5, 7 and 8.
Now this is where all the magic happens.
newparameters. (You don't have to recreate it every
Now this method gets called every time VTD is done decompressing any frame you sent to it. This method gets called even if there's an error or if the frame is dropped.
This is where we actually send the sampleBuffer off to the VTD to be decoded.
|
|
share improve this answer | edited Jan 4 at 21:16 | Apr 8 '15 at 20:44 Livy Stork 1,239 1 7 25 |
VideoToolBox限制项:https://developer.apple.com/library/ios/documentation/NetworkingInternet/Conceptual/StreamingMediaGuide/FrequentlyAskedQuestions/FrequentlyAskedQuestions.html
Frequently Asked Questions
- What kinds of encoders are supported?
The protocol specification does not limit the encoder selection. However, the current Apple implementation should interoperate with encoders that produce MPEG-2 Transport Streams containing H.264 video and AAC audio (HE-AAC or AAC-LC). Encoders that are capable of broadcasting the output stream over UDP should also be compatible with the current implementation of the Apple provided segmenter software. - What are the specifics of the video and audio formats supported?
Although the protocol specification does not limit the video and audio formats, the current Apple implementation supports the following formats:
- Video:
- H.264 Baseline Level 3.0, Baseline Level 3.1, Main Level 3.1, and High Profile Level 4.1.
- Audio:
- HE-AAC or AAC-LC up to 48 kHz, stereo audio
- MP3 (MPEG-1 Audio Layer 3) 8 kHz to 48 kHz, stereo audio
- AC-3 (for Apple TV, in pass-through mode only)
Note:
- What duration should media files be?
The main point to consider is that shorter segments result in more frequent refreshes of the index file, which might create unnecessary network overhead for the client. Longer segments will extend the inherent latency of the broadcast and initial startup time. A duration of 10 seconds of media per file seems to strike a reasonable balance for most broadcast content. - How many files should be listed in the index file during a continuous, ongoing session?
The normal recommendation is 3, but the optimum number may be larger.
The important point to consider when choosing the optimum number is that the number of files available during a live session constrains the client's behavior when doing play/pause and seeking operations. The more files in the list, the longer the client can be paused without losing its place in the broadcast, the further back in the broadcast a new client begins when joining the stream, and the wider the time range within which the client can seek. The trade-off is that a longer index file adds to network overhead—during live broadcasts, the clients are all refreshing the index file regularly, so it does add up, even though the index file is typically small. - What data rates are supported?
The data rate that a content provider chooses for a stream is most influenced by the target client platform and the expected network topology. The streaming protocol itself places no limitations on the data rates that can be used. The current implementation has been tested using audio-video streams with data rates as low as 64 Kbps and as high as 3 Mbps to iPhone. Audio-only streams at 64 Kbps are recommended as alternates for delivery over slow cellular connections.
For recommended data rates, see Preparing Media for Delivery to iOS-Based Devices.
Note: - What is a .ts file?
A
.ts
- file contains an MPEG-2 Transport Stream. This is a file format that encapsulates a series of encoded media samples—typically audio and video. The file format supports a variety of compression formats, including MP3 audio, AAC audio, H.264 video, and so on. Not all compression formats are currently supported in the Apple HTTP Live Streaming implementation, however. (For a list of currently supported formats, see Media Encoder.
MPEG-2 Transport Streams ares, and should not be confused with MPEG-2 compression. - What is an .M3U8 file?
An
.M3U8
- file is a extensible playlist file format. It is an m3u playlist containing UTF-8 encoded text. The m3u file format is a de facto standard playlist format suitable for carrying lists of media file URLs. This is the format used as the index file for HTTP Live Streaming. For details, see IETF Internet-Draft of the HTTP Live Streaming specification.
- How does the client software determine when to switch streams?
The current implementation of the client observes the effective bandwidth while playing a stream. If a higher-quality stream is available and the bandwidth appears sufficient to support it, the client switches to a higher quality. If a lower-quality stream is available and the current bandwidth appears insufficient to support the current stream, the client switches to a lower quality.
Note: - Where can I find a copy of the media stream segmenter from Apple?
The media stream segmenter, file stream segmenter, and other tools are frequently updated, so you should download the current version of the HTTP Live Streaming Tools from the Apple Developer website. See Download the Tools - What settings are recommended for a typical HTTP stream, with alternates, for use with the media segmenter from Apple?
See Preparing Media for Delivery to iOS-Based Devices.
These settings are the current recommendations. There are also certain requirements. The current
mediastreamsegmenter
- The segmenter also has a number of user-configurable settings. You can obtain a list of the command line arguments and their meanings by typing
man mediastreamsegmenter
- How can I specify what codecs or H.264 profile are required to play back my stream?
Use the
CODECS
- attribute of the
EXT-X-STREAM-INF
AAC-LC |
|
HE-AAC |
|
MP3 |
|
H.264 Baseline Profile level 3.0 |
|
H.264 Baseline Profile level 3.1 |
|
H.264 Main Profile level 3.0 |
|
H.264 Main Profile level 3.1 |
|
H.264 Main Profile level 4.0 |
|
H.264 High Profile level 3.1 |
|
H.264 High Profile level 4.0 |
|
H.264 High Profile level 4.1 |
|
The attribute value must be in quotes. If multiple values are specified, one set of quotes is used to contain all values, and the values are separated by commas. An example follows.
|
|
|
|
|
|
|
|
|
- How can I create an audio-only stream from audio/video input?
Add the
-audio-only
- How can I add a still image to an audio-only stream?
Use the
-meta-file
- argument when invoking the stream or file segmenter with
-meta-type=picture
mediafilesegmenter -f
- /Dir/outputFile
-a --meta-file=poster.jpg --meta-type=picture track01.mp3
- Remember that the image is typically resent every ten seconds, so it’s best to keep the file size small.
- How can I specify an audio-only alternate to an audio-video stream?
Use the
CODECS
- and
BANDWIDTH
- attributes of the
EXT-X-STREAM-INF
- The
BANDWIDTH
- If the
CODECS
- attribute is included, it must list all codecs required to play the stream. If only an audio codec is specified, the stream is identified as audio-only. Currently, it is not required to specify that a stream is audio-only, so use of the
CODECS
- The following is an example that specifies video streams at 500 Kbps for fast connections, 150 Kbps for slower connections, and an audio-only stream at 64 Kbps for very slow connections. All the streams should use the same 64 Kbps audio to allow transitions between streams without an audible disturbance.
|
|
|
|
|
|
|
- What are the hardware requirements or recommendations for servers?
See question #1 for encoder hardware recommendations.
The Apple stream segmenter is capable of running on any Intel-based Mac. We recommend using a Mac with two Ethernet network interfaces, such as a Mac Pro or an XServe. One network interface can be used to obtain the encoded stream from the local network, while the second network interface can provide access to a wider network. - Does the Apple implementation of HTTP Live Streaming support DRM?
No. However, media can be encrypted, and key access can be limited by requiring authentication when the client retrieves the key from your HTTPS server. - What client platforms are supported?
iPhone, iPad, and iPod touch (requires iOS version 3.0 or later), Apple TV (version 2 and later), and Mac OS X computers. - Is the protocol specification available?
Yes. The protocol specification is an IETF Internet-Draft, at http://tools.ietf.org/html/draft-pantos-http-live-streaming. - Does the client cache content?
The index file can contain an instruction to the client that content should not be cached. Otherwise, the client may cache data for performance optimization when seeking within the media. - Is this a real-time delivery system?
No. It has inherent latency corresponding to the size and duration of the media files containing stream segments. At least one segment must fully download before it can be viewed by the client, and two may be required to ensure seamless transitions between segments. In addition, the encoder and segmenter must create a file from the input; the duration of this file is the minimum latency before media is available for download. Typical latency with recommended settings is in the neighborhood of 30 seconds. - What is the latency?
Approximately 30 seconds, with recommended settings. See question #15. - Do I need to use a hardware encoder?
No. Using the protocol specification, it is possible to implement a software encoder. - What advantages does this approach have over RTP/RTSP?
HTTP is less likely to be disallowed by routers, NAT, or firewall settings. No ports need to be opened that are commonly closed by default. Content is therefore more likely to get through to the client in more locations and without special settings. HTTP is also supported by more content-distribution networks, which can affect cost in large distribution models. In general, more available hardware and software works unmodified and as intended with HTTP than with RTP/RTSP. Expertise in customizing HTTP content delivery using tools such as PHP is also more widespread.
Also, HTTP Live Streaming is supported in Safari and the media player framework on iOS. RTSP streaming is not supported. - Why is my stream’s overall bit rate higher than the sum of the audio and video bitrates?
MPEG-2 transport streams can include substantial overhead. They utilize fixed packet sizes that are padded when the packet contents are smaller than the default packet size. Encoder and multiplexer implementations vary in their efficiency at packing media data into these fixed packet sizes. The amount of padding can vary with frame rate, sample rate, and resolution. - How can I reduce the overhead and bring the bit rate down?
Using a more efficient encoder can reduce the amount of overhead, as can tuning the encoder settings. - Do all media files have to be part of the same MPEG-2 Transport Stream?
No. You can mix media files from different transport streams, as long as they are separated by
EXT-X-DISCONTINUITY
- Where can I get help or advice on setting up an HTTP audio/video server?
You can visit the Apple Developer Forum at http://devforums.apple.com/.
Also, check out Best Practices for Creating and Deploying HTTP Live Streaming Media for the iPhone and iPad.