ltx-2.3-spatial-upscaler-x2-1.0 causes unwanted text/overlay
Using the 2 stage workflow, the upscaler is causing an unwanted text overlay or "endscreen" looking garbled AI text at the end of every 20 second video. Shorter videos don't seem to be affected and I don't have hardware to try longer. I believe I have narrowed it down to the upscaler being the cause but I can't be certain, If I remove the upscaler and generate the 20 seconds in a 1 stage workflow I don't get the issue so that is my reasoning. I don't know what to do to solve this, I tried the older ltx2 upscaler but obviously it no longer works with the updated model...
I have had the same problem. U have to use the new sigmas and preprocessing values then it will work.
Using a 1.5x upscaler worked without any problems. There didn't seem to be any artifacts.
However, since 1.5x is not divisible by 768x512, I specified 768x480. In the first stage, instead of resizing to 0.5x, I specified 512x320 in the Resize v2 node.
Having same issue. Last 10 frames with the X2 upscaler will always have something completely random in it.
1.5X doesnt have that issue.
I have tried the suggestion of using different sigmas and preprocess values but that didn't get rid of the issue, it "changed" it slightly so that its more of slight "flash" of something popping up but it's still there, tried many different sigmas but can't get rid of it. I will try the 1.5x upscaler until this is addressed.
Same. See the screenshot, either a slight flash (left) or some kind of logo/unwanted text overlay (right).
I have had the same problem. U have to use the new sigmas and preprocessing values then it will work.
The new manual sigma value for the upscaler is 0.85, 0.7250, 0.4219, 0.0 and the preprocess value is 18 I think? The manual sigma for the 8-steps first sampler hasn't changed according to the workflows from https://github.com/Lightricks/ComfyUI-LTXVideo/
Same. See the screenshot, either a slight flash (left) or some kind of logo/unwanted text overlay (right).
I have had the same problem. U have to use the new sigmas and preprocessing values then it will work.
The new manual sigma value for the upscaler is 0.85, 0.7250, 0.4219, 0.0 and the preprocess value is 18 I think? The manual sigma for the 8-steps first sampler hasn't changed according to the workflows from https://github.com/Lightricks/ComfyUI-LTXVideo/
This is exactly the issue I'm seeing, still haven't got it to go away, I've resorted to using a first-middle-last frame workflow and setting the "middle" frame to whatever frame I actually want the video to end on and then just cutting the remainder between the middle and final frame. For example if I want 480 frames at 24 fps I set my total frames to 504 and put my middle frame at 480 and then just cut 481-504 and that seems to be working for now. Obviously if you don't use img2vid then I don't know. Either way this needs to be addressed or an explanation of why it's happening.
I'm also getting this artifacts at the end of the video. I thought at first it was the workflow that I'm using. But then I tried some other workflows and I'm getting the same result with the artifacts at the end. In my case, it happens when my video is longer than 15 seconds.
I'm also getting this artifacts at the end of the video. I thought at first it was the workflow that I'm using. But then I tried some other workflows and I'm getting the same result with the artifacts at the end. In my case, it happens when my video is longer than 15 seconds.
I guess all we can do for now is to generate longer than needed and trim the offensive frames. Even the 1.5 upscaler this happens from time to time. It seems like the training data was polluted with endscreens and logos and ads or something. Such a shame for an otherwise excellent model.
It seems to be always the final 14 frames that are affected.
It seems like the training data was polluted with endscreens and logos and ads or something. Such a shame for an otherwise excellent model.
Yes and why didn't they notice it? There's also always some background music added at the end, which also seems to come from endscreens or ads. And the fonts of the letters seem to be Vietnamese?
It seems to be always the final 14 frames that are affected.
It changes based on length
For me
121 length = 6 frames.
481 length = 13 frames.
961 length = 16 frames.
Same here. Will be there a fix? Right now i am using the 1.5 upsaler ....
I asked KI and it told me :
Set ManualSigmas
instead of "1.0 ... 0.421875, 0.0"
to
1.0, 0.99375, 0.9875, 0.98125, 0.975, 0.909375, 0.725, 0.421875, 0.05
and it worked for me.
You changed only the value for the first sampler but not for the upscaler?
Tried that, no problem through the first sampler, but then the upsampler stops at 50% - so I had to change it there too at the end 0.0500. That somehow tripled the time until it finished. But there's still a logo at the end of a 20 second video. @UelivonWerdenberg : I'm using a dev model, do you use a dev or a distilled one and how long are the videos you generated where it worked for you?
only the second. I have the standard comfyui workflow video_ltx2_3_i2v and in this it is the #211 node.
only the second. I have the standard comfyui workflow video_ltx2_3_i2v and in this it is the #211 node.
Ok, it doesn't work for the dev model.
"the #211 node"... that must be from ChatGPT ;-)
Use euler sampler, Linear Quadratic (Mochi) scheduler.
That's it. That's the magic pairing that solved the problem for me. I genned hundreds of videos using different combinations of sampler/schedulers. This is the only one that completes avoids both the white border for the entirety of the video, and the logo added to the end.
After hundreds of generations its that the first step is too weak in the upscaler.
You can increase the noise of the first step and fixes it but then you lose a lot of the input videos guidance.
Using the 8 step upscale sigmas UelivonWerdenberg posted worked but is twice as slow because the extra steps.
So started playing with the sigmas values. Which is basically how much noise to apply per step.
Here is a comparison.
First video is the default workflow of comfyui with my prompt.
https://www.youtube.com/watch?v=7WeydM1aHJk
Here are the values of the videos.
TopLeft 0.987, 0.85, 0.725, 0.422, 0.0
TopRight 0.987, 0.85, 0.725, 0.422, 0.05
BottomLeft 1.0, 0.9875, 0.85, 0.421875, 0.05
BottomRight 1.0, 0.9875, 0.85, 0.421875, 0.0
0.987, 0.85, 0.725, 0.422, 0.0
would probably be my choice since that 0.0 last step is pure de-noise instead of having a slight amount of grain added at 0.05
I need to spend more time reducing just that first value until the logo comes back then increase a bit more again to find the best value.
Got to love how fast the model is on my 5090.
Had some batches finish at 481 frames and 961 frames finish and it looks like that first sigma needs to be different based on length.
Logo shows for a single frame with 481 frames and 0.9125 first sigma. So setting 0.9175 prevents it with a little room for error.
And with 961 frames it showed for 1 frame at 0.9605, Iv only ran 2 more at 961 frames but it looks to be gone with first sigma of 0.9655.
So here would be a quick table to test for the rest of them assuming its some what linear need to test more duration but im out of time.
Then it wouldn't be hard to put together a node to calculate the value based on duration.
| Frames | Duration (s) | First sigma |
|---|---|---|
| 120 | 5 | 0.9055 |
| 241 | 10 | 0.9175 |
| 361 | 15 | 0.9295 |
| 481 | 20 | 0.9175 β |
| 601 | 25 | 0.9295 |
| 721 | 30 | 0.9415 |
| 841 | 35 | 0.9525 |
| 961 | 40 | 0.9655 β |
| 1081 | 45 | 0.9765 |
| 1201 | 50 | 0.9875 |
481 frames
0.9175, 0.87, 0.735, 0.445, 0.0
962 frames
0.9655, 0.87, 0.735, 0.445, 0.0
Got to love how fast the model is on my 5090.
Had some batches finish at 481 frames and 961 frames finish and it looks like that first sigma needs to be different based on length.
Logo shows for a single frame with 481 frames and 0.9125 first sigma. So setting 0.9175 prevents it with a little room for error.
And with 961 frames it showed for 1 frame at 0.9605, Iv only ran 2 more at 961 frames but it looks to be gone with first sigma of 0.9655.
So here would be a quick table to test for the rest of them assuming its some what linear need to test more duration but im out of time.
Then it wouldn't be hard to put together a node to calculate the value based on duration.
Frames Duration (s) First sigma 120 5 0.9055 241 10 0.9175 361 15 0.9295 481 20 0.9175 β 601 25 0.9295 721 30 0.9415 841 35 0.9525 961 40 0.9655 β 1081 45 0.9765 1201 50 0.9875 481 frames
0.9175, 0.87, 0.735, 0.445, 0.0
962 frames
0.9655, 0.87, 0.735, 0.445, 0.0
@bmgjetThanks for the chart. Seems to be working. I tested generating a few videos using the values from the chart and I don't see that annoying flashing distorted text at the end of the videos. Anyway, I'll do more testing later and let you guys know.
GPT 5.2 made a custom node to automatically generate the correct sigma based on the length. I tested it on the low and high part and it worked for me.
I am not sure whether i should post the code here. I cannot upload the node file here.
Advice is appreciated
well GPT made me also one, but its nothing else than the above chart. it changes the value automatically, but only if you have the correct amount of frames. if you change the frame count by 20 frames it will invent settings that do not work. so you can also use the above chart. thank you for that one!!!
That table I made is just a estimation based off those 2 points.
What really needs to happen is to test each duration to find the lowest first value. Then go up 2 steps from that for margin or error.
You want the lowest first value you can get so it preserves as much of the input images as it can.
Then you can accurately fill in all the steps between.
Ill have a bit more time tonight so ill probably come up with some later on.
that sounds great.. btw.. it seems to be a purely frame based problem, because your values work very well at 24fps and at 25fps (thats what i use) .. so its only the frame count that matters.. i get random problems if i go over 961 frames.. with 961 it can work but not always. (i dont meen the endscreen but other inconsistencies like artefacts in mid movie, face distortions etc. ). btw its about the same speed than wan 2.2 with sage_attention nodes, (also 5090).
I cannot upload the node file here.
You can link to it.
Went though all the values and found the lowest before the logo flashed.
Then when doing 10 gens of each length with random prompts had some still flashing a logo for a single frame.
The thing they all had in common was a short prompt. So it looks like if there isn't enough prompt to direct the whole video then it has a stronger tendency towards logo at the end.
I came up with a custom node thats on my github called ComfyUi-LTX23Sigmas
I wont post a link since I don't know if its allowed in the rules.
It takes the duration and input prompt and will output the sigmas string.
So far it has passed my 2nd round of 10 runs of each duration 121 to 1081.
Next experiment would be to see if adding one more extra step when going above 481 helps since that's the point that you need to add a lot more strength to the sigmas.
Which would allow for it to keep more of its input guidance.

