音频采样
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
从 Android 5.0 (Lollipop) 起,音频重采样器完全基于衍生自 Kaiser 加窗 sinc 函数的 FIR 滤波器。Kaiser 加窗 sinc 函数具有以下属性:
- 可以轻松地计算其设计参数(阻带波纹、过渡带宽、截止频率和滤波器长度)。
- 相对于整体能量来说,此函数几乎是减弱阻带能量的最佳选择。
请参阅 P.P. Vaidyanathan 编写的 Multirate Systems and Filter Banks 第 50 页,了解 Kaiser 窗口、其最优性以及与椭圆球面窗口的关系。
设计参数将根据内部质量确定结果和所需的采样比自动计算。根据设计参数,将生成加窗 sinc 滤波器。对于音乐用途,44.1 kHz 至 48 kHz(反之亦然)重采样器的生成质量要比任意频率转换的质量高。
为了达到这一质量要求,音频重采样器提供更高的质量和速度。不过,重采样器可能会带来少量的通带波纹和混叠谐波噪声,并且它们会导致过渡带中出现一些高频丢失,因此请避免不必要地使用重采样器。
有关采样和重采样的最佳做法
本部分介绍了可以帮助您避免采样率问题的一些最佳做法。
选择适合设备的采样率
一般而言,最好选择适合设备的采样率,通常为 44.1 kHz 或 48 kHz。使用大于 48 kHz 的采样率一般会导致质量下降,因为必须使用重采样器回放文件。
使用简单的重采样比(固定与插值多相)
重采样器可以在下列几种模式下运行:
- 固定多相模式。每个多相的滤波器系数都是预计算的。
- 插值多相模式。每个多相的滤波器系数必须从最接近的两个预计算多相插入。
重采样器在固定多相模式下最快,此时输入速率与输出速率之比 L/M(除去最大公约数)中的 M 小于 256。例如,对于 44100 至 48000 转换,L = 147,M = 160。
在固定多相模式下,采样率被锁定,不会发生变化。在插值多相模式下,采样率为近似值。在 48-kHz 设备上播放时,采样率偏移一般为几小时内一个样本。这通常不是问题,因为近似误差比内部石英振荡器、热偏移或抖动引起的频率误差(一般为数十 ppm)小得多。
在 48-kHz 设备上回放时,请选择简单比采样率(例如 24 kHz (1:2) 和 32 kHz (2:3)),即使可以通过 AudioTrack 允许其他采样率和比例。
使用上采样(而不是降采样)来更改采样率
可以动态更改采样率。此类更改的粒度基于内部缓冲(通常为数百个样本),而不是逐个样本更改。这可以用于音效。
降采样时,请不要动态更改采样率。如果在创建音轨后更改采样率,降采样时与原始采样率存在 5%-10% 的差异可能会触发滤波器重新计算(以正确抑制混叠)。这会消耗计算资源,并且如果滤波器被实时替换,还可能听到咔哒声。
将降采样限制为不大于 6:1
降采样通常由硬件设备要求触发。如果在降采样时使用采样率转换器,为了取得良好的混叠抑制效果,请尝试将降采样比限制为不大于 6:1(例如,不大于 48000:8000 的降采样)。滤波器长度将调整以匹配降采样比,但是如果降采样比较高,您将牺牲更多的过渡带宽来避免过度增加滤波器长度。上采样则没有类似的混叠担忧。请注意,某些部分的音频管道可能会阻止大于 2:1 的降采样。
如果您对延迟感到担忧,请不要重采样
重采样会阻止音轨被置于快速混合器路径,这意味着由于普通混合器路径中存在其他更大的缓冲区,将出现显著更长的延迟时间。此外,重采样器的滤波器长度还存在隐式延迟,尽管延迟的数量级一般为一毫秒或更短,不如普通混合器路径附加缓冲的时间(一般为 20 毫秒)长。
使用浮点音频
使用浮点数表示音频数据可以显著增强高性能音频应用中的音频质量。浮点数具有以下优势:
- 更宽的动态范围。
- 动态范围内一致的准确性。
- 更多余量,可以避免在中间计算和瞬态期间发生截断情况。
尽管可以增强音质,浮点数也存在特定的劣势:
- 浮点数占用更多内存。
- 浮点数运算具有意外特性,例如加法不遵守结合律。
- 由于四舍五入或者数字不稳定算法,浮点运算有时会牺牲一些算数精度。
- 若要有效使用浮点数,就需要对它有更充分的理解,才能获得准确且可重现的结果。
之前,浮点数曾因不可用或者速度慢而被诟病。这种情况仍然存在于低端和嵌入式处理器中。但是,对于现代移动设备上的处理器来说,硬件浮点数的性能已经与整数相似(某些情况下甚至比后者更快)。现代 CPU 还支持 SIMD(单指令流多数据流),这种技术可以进一步提升性能。
有关浮点音频的最佳做法
下面的最佳做法可以帮助您避免浮点运算存在的问题:
- 对频率较低的运算(例如计算滤波器系数)使用双精度浮点数。
- 注意运算顺序。
- 为中间值声明显式变量。
- 大量地使用括号。
- 如果获得 NaN 或无穷结果,请使用二分搜索发现导致这种情况的地方。
对于浮点音频,音频格式编码 AudioFormat.ENCODING_PCM_FLOAT
的使用方式类似于使用 ENCODING_PCM_16_BIT
或 ENCODING_PCM_8_BIT
指定 AudioTrack 数据格式。相应的过载方法 AudioTrack.write()
将采用浮点数组来提供数据。
Kotlin
fun write(
audioData: FloatArray,
offsetInFloats: Int,
sizeInFloats: Int,
writeMode: Int
): Int
Java
public int write(float[] audioData,
int offsetInFloats,
int sizeInFloats,
int writeMode)
更多信息
本部分列出了与采样和浮点数有关的一些其他资源。
采样
采样率
重采样
高位深度与高 kHz 争论
浮点
以下维基百科页面有助于理解浮点音频:
以下文章介绍了浮点对计算机系统设计人员有直接影响的方面:
本页面上的内容和代码示例受内容许可部分所述许可的限制。Java 和 OpenJDK 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-07-26。
[null,null,["最后更新时间 (UTC):2025-07-26。"],[],[],null,["# Sampling audio\n\nAs of Android 5.0 (Lollipop), the audio resamplers are now entirely based\non FIR filters derived from a Kaiser windowed-sinc function. The Kaiser windowed-sinc\noffers the following properties:\n\n- It is straightforward to calculate for its design parameters (stopband ripple, transition bandwidth, cutoff frequency, filter length).\n- It is nearly optimal for reduction of stopband energy compared to overall energy.\n\nSee P.P. Vaidyanathan, [*Multirate Systems and Filter Banks*](https://books.google.com/books/about/Multirate_Systems_and_Filter_Banks.html?id=pAsfAQAAIAAJ), p. 50 for discussions of the\nKaiser Window and its optimality and relationship to Prolate Spheroidal\nWindows.\n\nThe design parameters are automatically computed based on internal\nquality determination and the sampling ratios desired. Based on the\ndesign parameters, the windowed-sinc filter is generated. For music use,\nthe resampler for 44.1 to 48 kHz and vice versa is generated at a higher\nquality than for arbitrary frequency conversion.\n\nThe audio resamplers provide increased quality, as well as speed\nto achieve that quality. But resamplers can introduce small amounts\nof passband ripple and aliasing harmonic noise, and they can cause some high\nfrequency loss in the transition band, so avoid using them unnecessarily. \n\nBest practices for sampling and resampling\n------------------------------------------\n\nThis section describes some best practices to help you avoid problems with sampling rates.\n\n#### Choose the sampling rate to fit the device\n\nIn general, it is best to choose the sampling rate to fit the device,\ntypically 44.1 kHz or 48 kHz. Use of a sample rate greater than\n48 kHz will typically result in decreased quality because a resampler must be\nused to play back the file.\n\n### Use simple resampling\nratios (fixed versus interpolated polyphases)\n\nThe resampler operates in one of the following modes:\n\n- Fixed polyphase mode. The filter coefficients for each polyphase are precomputed.\n- Interpolated polyphase mode. The filter coefficients for each polyphase must be interpolated from the nearest two precomputed polyphases.\n\nThe resampler is fastest in fixed polyphase mode, when the ratio of input\nrate over output rate L/M (taking out the greatest common divisor)\nhas M less than 256. For example, for 44,100 to 48,000 conversion, L = 147,\nM = 160.\n\nIn fixed polyphase mode, the sampling rate is locked and does not change. In interpolated\npolyphase mode, the sampling rate is approximate. When playing on a 48-kHz device the sampling rate\ndrift is generally one sample over a few hours. This is not usually a concern because the\napproximation error is much less than the frequency error contributed by internal quartz\noscillators, thermal drift, or jitter (typically tens of ppm).\n\nChoose simple-ratio sampling rates such as 24 kHz (1:2) and 32 kHz (2:3) when playing back\non a 48-kHz device, even though other sampling\nrates and ratios may be permitted through AudioTrack.\n\n### Use upsampling rather\nthan downsampling to change sample rates\n\nSampling rates can be changed on the fly. The granularity of\nsuch change is based on the internal buffering (typically a few hundred\nsamples), not on a sample-by-sample basis. This can be used for effects.\n\nDo not dynamically change sampling rates when\ndownsampling. When changing sample rates after an audio track is\ncreated, differences of around 5 to 10 percent from the original rate may\ntrigger a filter recomputation when downsampling (to properly suppress\naliasing). This can consume computing resources and may cause an audible click\nif the filter is replaced in real time.\n\n### Limit downsampling to no more than 6:1\n\nDownsampling is typically triggered by hardware device requirements. When the\nSample Rate converter is used for downsampling,\ntry to limit the downsampling ratio to no more than 6:1 for good aliasing\nsuppression (for example, no greater downsample than 48,000 to 8,000). The filter\nlengths adjust to match the downsampling ratio, but you sacrifice more\ntransition bandwidth at higher downsampling ratios to avoid excessively\nincreasing the filter length. There are no similar aliasing concerns for\nupsampling. Note that some parts of the audio pipeline\nmay prevent downsampling greater than 2:1.\n\n### If you're concerned about latency, don't resample\n\nResampling prevents the track from being placed in the FastMixer\npath, which means that significantly higher latency occurs due to the additional,\nlarger buffer in the ordinary Mixer path. Furthermore,\nthere is an implicit delay from the filter length of the resampler,\nthough this is typically on the order of one millisecond or less,\nwhich is not as large as the additional buffering for the ordinary Mixer path\n(typically 20 milliseconds).\n\nUse of floating-point audio\n---------------------------\n\nUsing floating-point numbers to represent audio data can significantly enhance audio\nquality in high-performance audio applications. Floating point offers the following\nadvantages:\n\n- Wider dynamic range.\n- Consistent accuracy across the dynamic range.\n- More headroom to avoid clipping during intermediate calculations and transients.\n\nWhile floating-point can enhance audio quality, it does present certain disadvantages:\n\n- Floating-point numbers use more memory.\n- Floating-point operations employ unexpected properties, for example, addition is not associative.\n- Floating-point calculations can sometimes lose arithmetic precision due to rounding or numerically unstable algorithms.\n- Using floating-point effectively requires greater understanding to achieve accurate and reproducible results.\n\n\nFormerly, floating-point was notorious for being unavailable or slow. This is\nstill true for low-end and embedded processors. But processors on modern\nmobile devices now have hardware floating-point with performance that is\nsimilar (or in some cases even faster) than integer. Modern CPUs also support\n[SIMD](https://en.wikipedia.org/wiki/SIMD)\n(Single instruction, multiple data), which can improve performance further.\n\n### Best practices for floating-point audio\n\nThe following best practices help you avoid problems with floating-point calculations:\n\n- Use double precision floating-point for infrequent calculations, such as computing filter coefficients.\n- Pay attention to the order of operations.\n- Declare explicit variables for intermediate values.\n- Use parentheses liberally.\n- If you get a NaN or infinity result, use binary search to discover where it was introduced.\n\nFor floating-point audio, the audio format encoding\n`AudioFormat.ENCODING_PCM_FLOAT` is used similarly to\n`ENCODING_PCM_16_BIT` or `ENCODING_PCM_8_BIT` for specifying\nAudioTrack data\nformats. The corresponding overloaded method `AudioTrack.write()`\ntakes in a float array to deliver data. \n\n### Kotlin\n\n```kotlin\nfun write(\n audioData: FloatArray,\n offsetInFloats: Int,\n sizeInFloats: Int,\n writeMode: Int\n): Int\n```\n\n### Java\n\n```java\npublic int write(float[] audioData,\n int offsetInFloats,\n int sizeInFloats,\n int writeMode)\n```\n\nFor more information\n--------------------\n\nThis section lists some additional resources about sampling and floating-point.\n\n### Sampling\n\nSample rates\n\n- [Sampling (signal processing)](https://en.wikipedia.org/wiki/Sampling_%28signal_processing%29) at Wikipedia.\n\nResampling\n\n- [Sample-rate conversion](https://en.wikipedia.org/wiki/Sample_rate_conversion) at Wikipedia.\n- [Sample Rate Conversion](https://source.android.com/devices/audio/src.html) at source.android.com.\n\nThe high bit-depth and high kHz controversy\n\n- [D/A and A/D \\| Digital Show and Tell](https://www.youtube.com/watch?v=cIQ9IXSUzuM) video by Christopher \"Monty\" Montgomery of Xiph.Org.\n- [The Science of Sample Rates (When Higher Is Better - And When It Isn't)](http://www.trustmeimascientist.com/2013/02/04/the-science-of-sample-rates-when-higher-is-better-and-when-it-isnt/).\n- [Audio Myths \\& DAW Wars](http://www.image-line.com/support/FLHelp/html/app_audio.htm)\n- [192kHz/24bit vs. 96kHz/24bit \"debate\"- Interesting revelation](http://forums.stevehoffman.tv/threads/192khz-24bit-vs-96khz-24bit-debate-interesting-revelation.317660/)\n\n### Floating point\n\nThe following Wikipedia pages are helpful in understanding floating-point audio:\n\n- [Audio bit depth](https://en.wikipedia.org/wiki/Audio_bit_depth)\n- [Floating-point arithmetic](https://en.wikipedia.org/wiki/Floating_point)\n- [IEEE 754 floating-point](https://en.wikipedia.org/wiki/IEEE_floating_point)\n- [Loss of significance](https://en.wikipedia.org/wiki/Loss_of_significance) (catastrophic cancellation)\n- [Numerical stability](https://en.wikipedia.org/wiki/Numerical_stability)\n\nThe following article provides information on those aspects of floating-point that have a\ndirect impact on designers of computer systems:\n\n- [What every\n computer scientist should know about floating-point arithmetic](http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html) by David Goldberg, Xerox PARC (edited reprint)."]]