Commit graph

82 commits

Author SHA1 Message Date
liushuyu a578df4c6b
video_core/codecs: more fixes for VAAPI detection ...
* skip impersonated VAAPI implementaions ("imposter detection")
* place VAAPI priority below CUDA/NVDEC/CUVID
2021-12-02 21:31:51 -07:00
liushuyu 20a46790d7 video_core/codec: address comments 2021-12-02 21:01:34 -07:00
liushuyu cd27f211c8 video_core/codecs: more robust ffmpeg hwdecoder selection logic 2021-12-02 21:01:34 -07:00
liushuyu 60928cf8cd
video_core/codec: address comments 2021-11-24 18:06:38 -07:00
liushuyu 72aa418b0b
video_core/codecs: fix multiple decoding issues on Linux ...
* when someone installed Intel video drivers on an AMD system, the
  decoder will select the Intel VA-API decoding driver and yuzu will
  crash due to incorrect driver selection; the fix will check if the
  currently about-to-use driver is loaded in the kernel
* when using NVIDIA driver on Linux with a ffmpeg that does not have
  CUDA capability enabled, the decoder will crash; the fix simply
  making the decoder prefers the VDPAU driver over CUDA on Linux
2021-11-24 17:23:57 -07:00
ameerj c50f170597 codes: Rename ComposeFrameHeader to ComposeFrame
These functions were composing the entire frame, not just the headers. Rename to more accurately describe them.
2021-11-12 23:52:19 -05:00
ameerj d35391b9f4 vp8: Implement header composition
Enables frame decoding with FFmpeg
2021-11-12 23:52:18 -05:00
ameerj b39b33b1fe codecs: Add VP8 codec class 2021-11-12 19:49:45 -05:00
Morph 894b483a0d
Merge pull request #7157 from ameerj/vic-surface-size
vic: Use the minimum of surface/frame dimensions when writing the final frame to the GPU
2021-10-13 20:41:17 -04:00
Ameer J 018cf3853e
Merge pull request #7109 from vonchenplus/fix_h264_max__reference_num_error
h264: Use max allowed max_num_ref_frames when using CPU decoding
2021-10-12 14:08:37 -04:00
ameerj f346b04d12 vic: Use the minimum of surface/frame dimensions when writing the final frame to the GPU
Addresses possible buffer overflow behavior.
2021-10-10 18:44:16 -04:00
Feng Chen ba8be75037 h264: Use max allowed max_num_ref_frames when using CPU decoding 2021-10-10 20:07:19 +08:00
Valeri 0394e4bb8e
vic: Allow surface to be higher than frame
Touhou Genso Wanderer Lotus Labyrinth R decodes 1920x1080 videos into 1920x1088 surface.
Only allow mismatch for height, since larger width would result in increasingly offset rows and somewhat defeat entire purpose of this check.
2021-10-09 20:22:09 +03:00
ameerj 403fc86c11 vic: Avoid memory corruption when multiple streams with different dimensions are decoded
This is a work around to avoid buffer overflow errors until multi channel/multi stream decoding is supported.
2021-10-08 01:22:38 -04:00
ameerj 5aae61775f vic: Refactor frame writing methods 2021-10-07 14:56:44 -04:00
ameerj 899fdb9c44 vic: Implement RGBX frame format 2021-10-07 11:06:57 -04:00
Morph ae028ddf22 codec: Add missing <string_view> include 2021-09-11 17:19:14 -04:00
Fernando S be4e192903
Merge pull request #6846 from ameerj/nvdec-gpu-decode
nvdec: Add GPU video decoding for all capable drivers and platforms
2021-09-11 23:11:32 +02:00
ameerj eb2624ed65 vp9_types: Minor refactor of VP9 info structs. 2021-08-25 21:42:43 -04:00
ameerj 3de38c9a70 vp9_types: Remove unused Vp9PictureInfo members 2021-08-25 21:29:22 -04:00
ameerj b384129c63 h264: Lower max_num_ref_frames
GPU decoding seems to be more picky when it comes to the maximum number of reference frames.
2021-08-16 14:40:53 -04:00
ameerj cd016d3cb5 configure_graphics: Add GPU nvdec decoding as an option
Some system configurations may see visual regressions or lower performance using GPU decoding compared to CPU decoding. This setting provides the option for users to specify their decoding preference.

Co-Authored-By: yzct12345 <87620833+yzct12345@users.noreply.github.com>
2021-08-16 14:40:53 -04:00
ameerj a832aa699f codec: Improve libav memory alloc and cleanup 2021-08-16 14:40:53 -04:00
ameerj bc3efb79cc codec: Fallback to CPU decoding if no compatible GPU format is found 2021-08-16 14:40:53 -04:00
bunnei 0509fe3377
Merge pull request #6838 from ameerj/sws-align
vic: Specify sws_scale height stride.
2021-08-12 11:28:33 -07:00
ameerj 356e10898f codec: Replace deprecated av_init_packet usage 2021-08-12 01:28:01 -04:00
ameerj 659039ca6d nvdec: Implement GPU accelerated decoding for all platforms
Supplements the VAAPI intel gpu decoder by implementing the D3D11VA decoder for Windows, and CUVID/VDPAU for Nvidia and AMD on drivers linux respectively.
2021-08-12 01:28:01 -04:00
ameerj a779cede7c vic: Specify sws_scale height stride.
Silences a sws_scale runtime warning about unaligned strides.
2021-08-09 23:24:16 -04:00
ameerj fa22695705 vp9: Ensure the first frame is complete
Silences a runtime error due to the first frame missing the frame data, and being set to hidden despite being a key-frame.
2021-08-08 13:49:00 -04:00
ameerj 928b64d2ce nvdec: Better logging for unimplemented codecs 2021-08-07 01:08:33 -04:00
bunnei f183668a87
Merge pull request #6799 from ameerj/vp9-fixes
nvdec: Fix VP9 reference frame refreshes
2021-08-06 17:46:46 -07:00
ameerj e3688f0627 vp9: Cleanup unused variables
With reference frames refreshes fix, we no longer need to buffer two frames in advance.
We can also remove other unused or otherwise unneeded variables.
2021-08-06 20:08:11 -04:00
ameerj a3f80a97a3 vp9: Fix reference frame refreshes
This resolves the artifacting when decoding VP9 streams.
2021-08-06 20:08:08 -04:00
yzct12345 2868d4ba84
nvdec: Implement VA-API hardware video acceleration (#6713)
* nvdec: VA-API

* Verify formatting

* Forgot a semicolon for Windows

* Clarify comment about AV_PIX_FMT_NV12

* Fix assert log spam from missing negation

* vic: Remove forgotten debug code

* Address lioncash's review

* Mention VA-API is Intel/AMD

* Address v1993's review

* Hopefully fix CMakeLists style this time

* vic: Improve cache locality

* vic: Fix off-by-one error

* codec: Async

* codec: Forgot the GetValue()

* nvdec: Address ameerj's review

* codec: Fallback to CPU without VA-API support

* cmake: Address lat9nq's review

* cmake: Make VA-API optional

* vaapi: Multiple GPU

* Apply suggestions from code review

Co-authored-by: Ameer J <52414509+ameerj@users.noreply.github.com>

* nvdec: Address ameerj's review

* codec: Use anonymous instead of static

* nvdec: Remove enum and fix memory leak

* nvdec: Address ameerj's review

* codec: Remove preparation for threading

Co-authored-by: Ameer J <52414509+ameerj@users.noreply.github.com>
2021-08-03 23:43:11 -04:00
Fernando S da4ca4f2f9
Merge pull request #6525 from ameerj/nvdec-fixes
nvdec: Fix Submit Ioctl data source, vic frame dimension computations
2021-07-15 15:17:50 +02:00
ameerj b7fa264749 vic: Fix dimension compuation of YUV frames
Fixes out of bound memory crashes in Mario Golf
2021-07-15 00:51:50 -04:00
bunnei bf50345d4c
Merge pull request #6537 from Morph1984/warnings
general: Enforce multiple warnings in MSVC
2021-07-05 17:09:23 -07:00
Kelebek1 208a04dcff Slightly refactor NVDEC and codecs for readability and safety 2021-07-01 06:22:05 +01:00
Morph 22d7b89c15 video_core: Remove #pragma warning directives for external headers 2021-06-28 14:21:40 -04:00
ReinUsesLisp 05bd50a1cf codec,vic: Disable warnings in ffmpeg headers 2021-06-26 03:29:31 -03:00
lat9nq a60653dcd3 vp9: Avoid memcpy with null pointers
Avoid sending null pointer to memcpy as reported by Undefined Behaviour
Sanitizer. Replaces the std::memcpy calls in SpliceVectors with
std::copy calls. Opting to replace all the memcpy's with copy's.

Co-authored-by: LC <mathew1800@gmail.com>
2021-04-05 00:44:38 -04:00
ameerj b675c44e49 rebase, fix name shadowing, more const 2021-02-13 13:07:56 -05:00
ameerj 09722cb4a7 streamline cdma_pusher/command_classes 2021-02-13 13:07:56 -05:00
ameerj 77564f987c streamline cdma_pusher/command_classes 2021-02-13 13:07:53 -05:00
ameerj ac265a72ce nvdec cleanup 2021-02-13 13:07:31 -05:00
ReinUsesLisp 82c2601555 video_core: Reimplement the buffer cache
Reimplement the buffer cache using cached bindings and page level
granularity for modification tracking. This also drops the usage of
shared pointers and virtual functions from the cache.

- Bindings are cached, allowing to skip work when the game changes few
  bits between draws.
- OpenGL Assembly shaders no longer copy when a region has been modified
  from the GPU to emulate constant buffers, instead GL_EXT_memory_object
  is used to alias sub-buffers within the same allocation.
- OpenGL Assembly shaders stream constant buffer data using
  glProgramBufferParametersIuivNV, from NV_parameter_buffer_object. In
  theory this should save one hash table resolve inside the driver
  compared to glBufferSubData.
- A new OpenGL stream buffer is implemented based on fences for drivers
  that are not Nvidia's proprietary, due to their low performance on
  partial glBufferSubData calls synchronized with 3D rendering (that
  some games use a lot).
- Most optimizations are shared between APIs now, allowing Vulkan to
  cache more bindings than before, skipping unnecesarry work.

This commit adds the necessary infrastructure to use Vulkan object from
OpenGL. Overall, it improves performance and fixes some bugs present on
the old cache. There are still some edge cases hit by some games that
harm performance on some vendors, this are planned to be fixed in later
commits.
2021-02-13 02:17:22 -03:00
Lioncash 8620de6b20 common/bit_util: Replace CLZ/CTZ operations with standardized ones
Makes for less code that we need to maintain.
2021-01-15 02:15:32 -05:00
Ameer J 16392a23cc remove inaccurate reference
Co-authored-by: LC <mathew1800@gmail.com>
2021-01-07 14:33:45 -05:00
ameerj 06cef3355e fix for nvdec disabled, cleanup host1x 2021-01-07 14:33:45 -05:00
ameerj 2c27127d04 nvdec syncpt incorporation
laying the groundwork for async gpu, although this does not fully implement async nvdec operations
2021-01-07 14:33:45 -05:00