The W3C Actions API replaced Appium's deprecated TouchAction class. It models gestures as sequences of pointer and keyboard inputs — the same specification browsers use for touch events. The result is more precise gesture control with less flake.
Core concepts
The W3C Actions API has two input source types for mobile:
- PointerInput: models a finger (or stylus). Each sequence of pointer moves, presses, and releases simulates one finger.
- KeyInput: models hardware key presses. Used for Back, Home, Volume, and other physical buttons.
An Actions object holds one or more input sequences and sends them to Appium as a single multi-touch action when perform() is called.
Tap
A tap is a pointer press followed immediately by a pointer release:
import org.openqa.selenium.interactions.PointerInput;
import org.openqa.selenium.interactions.Sequence;
import java.util.List;
import java.time.Duration;
public void tap(int x, int y) {
PointerInput finger = new PointerInput(PointerInput.Kind.TOUCH, "finger");
Sequence tap = new Sequence(finger, 0)
.addAction(finger.createPointerMove(Duration.ZERO, PointerInput.Origin.viewport(), x, y))
.addAction(finger.createPointerDown(PointerInput.MouseButton.LEFT.asArg()))
.addAction(finger.createPointerUp(PointerInput.MouseButton.LEFT.asArg()));
driver.perform(List.of(tap));
}For tapping an element rather than a coordinate:
public void tap(WebElement element) {
Point location = element.getLocation();
Dimension size = element.getSize();
int centerX = location.getX() + size.getWidth() / 2;
int centerY = location.getY() + size.getHeight() / 2;
tap(centerX, centerY);
}Long press
A long press holds the pointer down for a duration before releasing:
public void longPress(int x, int y, Duration holdDuration) {
PointerInput finger = new PointerInput(PointerInput.Kind.TOUCH, "finger");
Sequence longPress = new Sequence(finger, 0)
.addAction(finger.createPointerMove(Duration.ZERO, PointerInput.Origin.viewport(), x, y))
.addAction(finger.createPointerDown(PointerInput.MouseButton.LEFT.asArg()))
.addAction(new org.openqa.selenium.interactions.Pause(finger, holdDuration))
.addAction(finger.createPointerUp(PointerInput.MouseButton.LEFT.asArg()));
driver.perform(List.of(longPress));
}
// Usage
longPress(300, 500, Duration.ofSeconds(2));Swipe
A swipe is a pointer-down, move-to-destination, pointer-up sequence:
public void swipe(int startX, int startY, int endX, int endY) {
PointerInput finger = new PointerInput(PointerInput.Kind.TOUCH, "finger");
Sequence swipe = new Sequence(finger, 0)
.addAction(finger.createPointerMove(Duration.ZERO, PointerInput.Origin.viewport(), startX, startY))
.addAction(finger.createPointerDown(PointerInput.MouseButton.LEFT.asArg()))
.addAction(finger.createPointerMove(Duration.ofMillis(600), PointerInput.Origin.viewport(), endX, endY))
.addAction(finger.createPointerUp(PointerInput.MouseButton.LEFT.asArg()));
driver.perform(List.of(swipe));
}The Duration.ofMillis(600) in createPointerMove is the gesture speed. Faster swipes (< 200ms) are treated as flings; slower ones (> 800ms) are treated as scrolls by many apps. 400–600ms is a natural human swipe speed.
Pinch and zoom
Two-finger gestures require two PointerInput sequences executed simultaneously:
public void pinchToZoom(WebElement element, double scaleFactor) {
Point center = element.getLocation();
Dimension size = element.getSize();
int cx = center.getX() + size.getWidth() / 2;
int cy = center.getY() + size.getHeight() / 2;
int offset = (int)(Math.min(size.getWidth(), size.getHeight()) * 0.3);
PointerInput finger1 = new PointerInput(PointerInput.Kind.TOUCH, "finger1");
PointerInput finger2 = new PointerInput(PointerInput.Kind.TOUCH, "finger2");
// Zoom out: fingers start at edges, move to center
// Zoom in: fingers start at center, move to edges
int startOffset = scaleFactor > 1 ? offset / 4 : offset;
int endOffset = scaleFactor > 1 ? offset : offset / 4;
Sequence pinch1 = new Sequence(finger1, 0)
.addAction(finger1.createPointerMove(Duration.ZERO, PointerInput.Origin.viewport(), cx - startOffset, cy))
.addAction(finger1.createPointerDown(PointerInput.MouseButton.LEFT.asArg()))
.addAction(finger1.createPointerMove(Duration.ofMillis(600), PointerInput.Origin.viewport(), cx - endOffset, cy))
.addAction(finger1.createPointerUp(PointerInput.MouseButton.LEFT.asArg()));
Sequence pinch2 = new Sequence(finger2, 0)
.addAction(finger2.createPointerMove(Duration.ZERO, PointerInput.Origin.viewport(), cx + startOffset, cy))
.addAction(finger2.createPointerDown(PointerInput.MouseButton.LEFT.asArg()))
.addAction(finger2.createPointerMove(Duration.ofMillis(600), PointerInput.Origin.viewport(), cx + endOffset, cy))
.addAction(finger2.createPointerUp(PointerInput.MouseButton.LEFT.asArg()));
driver.perform(List.of(pinch1, pinch2));
}Encapsulating gestures in GestureUtils
Keep W3C Actions code in a dedicated utility class so page objects stay readable:
public class GestureUtils {
private final AppiumDriver driver;
public GestureUtils(AppiumDriver driver) {
this.driver = driver;
}
public void tap(WebElement element) { ... }
public void longPress(WebElement element, Duration hold) { ... }
public void swipeUp() {
Dimension screen = driver.manage().window().getSize();
swipe(screen.getWidth() / 2, (int)(screen.getHeight() * 0.7),
screen.getWidth() / 2, (int)(screen.getHeight() * 0.3));
}
public void swipeDown() { ... }
public void swipeLeft(WebElement container) { ... }
public void swipeRight(WebElement container) { ... }
}Page objects receive GestureUtils through constructor injection:
public class ProductPage extends BasePage {
private final GestureUtils gestures;
public ProductPage(AppiumDriver driver) {
super(driver);
this.gestures = new GestureUtils(driver);
}
public ProductPage scrollToReviews() {
gestures.swipeUp();
return this;
}
}