COMPARISON OF CLASSICAL AND INTERACTIVE MULTI-ROBOT EXPLORATION STRATEGIES IN POPULATED ENVIRONMENTS

. Multi-robot exploration consists in coordinating robots for mapping an unknown environment. It raises several issues concerning task allocation, robot control, path planning and communication. We study exploration in populated environments, in which pedestrian ﬂows can severely impact performances. However, humans have adaptive skills for taking advantage of these ﬂows while moving. Therefore, in order to exploit these human abilities, we propose a novel exploration strategy that explicitly allows for human-robot interactions. Our model for exploration in populated environments combines the classical frontier-based strategy with our interactive approach. We implement interactions where robots can locally choose a human guide to follow and deﬁne a parametric heuristic to balance interaction and frontier assignments. Finally, we evaluate to which extent human presence impacts our exploration model in terms of coverage ratio, travelled distance and elapsed time to completion.


Introduction
Mobile robots intervene in our daily life and provide services (e.g., guidance, assistance [1,2]) or even leisure activities (e.g., providing company, dancing [3,4]).This intrusion of mobile robots into citizens' day-to-day lives must take people into account, whilst seeking social compliance.Human activities and motion patterns are already studied [5], so that a robot can learn a model of the human behaviour to generate a socially compliant control and apply it.For example, by observing pedestrians walking nearby, a robot could model the pedestrian dynamics and generate its own navigation control for efficiently navigating in populated environments.
Multi-Robot Exploration (MRE) consists in reconstructing all the reachable space of an unknown environment by controlling mobile robots.Introducing human presence awareness into a robotic exploration system for populated environments can constitute an interesting route for study purposes.Indeed, it paves the way for human-robot interaction (HRI)-based exploration approaches.In populated environments, the exploration task raises new concerns regarding clean reconstruction and efficient robot coordination.
Concerning reconstruction quality, it is particularly difficult to separate static aspects (background) from dynamic aspects (people, robots) of the scene [6].Obviously, mobile robot perceptions are biased due to the dynamics of the environment, thus hindering lo-calisation and mapping.Regarding the selection of targets to explore, pedestrian movements create spatiotemporal reachability of known/unknown areas making exploration tricky.In fact, the reachable space evolves dynamically according to the density of the human presence.
Nevertheless, humans can understand the dynamics of their environment, and can sense, decide and act adequately.In this sense, we can assume that every person has an adaptive heuristic, depending on the local environment, that allows him or her to walk readily through dense areas (e.g., crowds).We are interested in exploiting human skills as possible heuristics for the exploration task.We propose a weighted heuristic that incorporates human presence for selecting areas to explore or for initiating human-robot interactions.This is followed by a brief state of the art in MRE, and we also situate our approach among HRI applications in mobile robotics.In the third section, we formalise the multi-agent system for exploration in populated environments, and present the framework of our study.The fourth section defines the mixed exploration approach (robot-frontier/interaction) and proposes a human-aware exploration heuristic for establishing human following interactions.We then perform several experiments of our mixed approach in simulation, to underline the variability of the performance depending on the environment.Finally, we discuss our results and perspectives regarding machine learning for adaptive heuristic parameterisation.

Related Work
First, this section presents previous work in the field of MRE.Then, we situate our study among mobile robotic applications of HRI.

Multi-Robot Exploration
The MRE problem consists in acquiring an accurate representation of an environment by efficiently coordinating the actions of robots within it.Representation accuracy refers to the degree of closeness to the ground truth.Coordination of the robots arises from the teamwork involved in solving the task.Coordination efficiency can be evaluated at several levels, e.g., energy consumption, trajectory overlapping, etc.
Thus, MRE solutions design efficient control of robots for accurately completing a chosen representation of the environment (e.g., graph).The proposed solutions can be roughly classified into reactive, goalbased and utility-based agent design.
Within reactive and bio-inspired approaches, the actions of an agent are hardwired to its perceptions, and simple navigation rules can be created, e.g., as following walls, circular or boustrophedon patterns [7][8][9][10].These approaches are usually only concerned with coverage.They do not always consider mapping, as typical reactive agents are memory-less.
For goal-based agents, frontier-based methods offer a good computation/performance tradeoff, making them particularly suitable for deployment in real embedded systems.The idea is to incrementally assign robots to frontiers (separating the known and free space from the unknown space), thus serializing the exploration task into subgoals.Various frontiers evaluation and target selection methods are discussed in the literature [11][12][13].
In utility-based approaches, an agent makes decisions according to the value of world states.The information gain value is proposed in [14], and [15] considers curiosity, surprise and hunger as motivational agents.
In our present study, we consider goal-based agents with frontiers to reach and humans to interact with as subgoals during exploration.We are looking for a parametric heuristic that evaluates/balances frontiers and interactions for exploiting human adaptive skills.

Human-Robot Interaction
HRI is defined as the study of humans, robots and their mutual influences [16].To the best of our knowledge, the office-conversant robot Jijo-2 is the only HRI application of mobile robotic exploration.This robot exhibits socially embedded learning mechanisms [17] by gathering information while conversing with other people.Thus, it realises semi-supervised learning by incorporating local oracle heuristics while exploring.
We present an application in mobile robotics considering close interactions established by proximity or direct perception between humans and robots.This type of interactions belongs to the Intimate Interaction class defined by Takeda into his HRI classification [18].
Our study bridges together intimate HRI applications and MRE goal-based algorithms.

Modelling the Environment and the Agents
We propose a model for representing the multi-agent system for populated environments (Fig. 1).
In (1), the environment to explore is described by a navigation map E, which evolves over time.This evolution results from the actions of the agents (humans H and robots R).At each time t, a robot R i from R has a configuration from which it observes the environment.O t i is an observed subset of E, it corresponds to the robot's observation at time t.

Exploration and Completion
For the exploration task, we must represent the environment explored by the robots over time (2).Let θ 0:t i be the set of observations, namely the local t-time history, that agent R i has experienced up to time t.Similarly, Θ 0:t is the global t-time history, which aggregates the local t-time histories from R. Thus, we have: ( It is of fundamental importance for the robots to know when the exploration is finished.The completion criterion determines this moment and can be defined locally on each robot.Robots determine exploration completion based upon already explored space Θ.The mission is over as soon as there is no configuration in the already explored space that allows for new observations.

Instantiating the Multi-Agent System
We represent E as a discrete occupancy grid of l × w square cells.Each cell has 4 possible states: the unknown (not observed), occupied (walls, objects), animated (humans, robots) and free (empty) states.
States transitions are illustrated in Fig. 2. In this grid representation, R becomes the set of cells animated by the robots and R i describes the position of one robot on the grid.The observation area of each robot is within a limited circle.An environment, a robot and a human are represented in Fig. 3a.The robot is located on cell R 1 at (1, 1) and the human on cell H 1 at (1, 2).The maximum field of view of R 1 is within the dashed arc in Fig. 3b.O 1  1 consists of 7 cells: 3 are free, 2 are occupied and 2 are animated.The explored environment θ 0: 1  1 is limited to this first observation.
We have provided an instance of a multi-agent system for exploration in populated environments.The environment is represented with a discrete occupancy grid, agents are characterised by their identifier and their coordinates on the grid, and observations are made by casting rays within the viewing range of a robot.Our study is based on this representation of a multi-agent system.In the next section, we present the frontier/interaction exploration approach.

Mixed Exploration Approach by Frontiers and Interactions
First, let us consider the MRE problem defined as a target allocation problem of robots in an unknown environment [12][13][14].A solution to the MRE problem defines a way to explore an unknown space, i.e., how to assign robots from R to tasks/targets from T. To achieve this, we can look for an assignment matrix A RT that optimises the cost matrix C RT (cf.Fig. 4).

Various Approaches for Multi-Robot Exploration
We show how different sets of targets define the classical frontier-based exploration, our new interactive approach and the mixed approach (frontier/interaction).

Frontier-Based Exploration
A frontier is the observed boundary between an explored space and an unexplored space [11].Classical frontier-based exploration is defined by choosing the targets from the set of frontiers F (3). Let: In populated environments this approach can fail when the path to a chosen frontier is congested by humans.

Interactive Exploration
Human-robot interaction is defined as the reciprocal influence between a human and a robot, followed by one or more effects.We introduce an interactive approach that takes into account the presence of humans for establishing human-robot interactions (opening a door, guiding though a crowd, etc.).Targets are now chosen from the set of humans H (4). Let: A purely interactive approach can be inefficient in sparsely populated environments.Indeed, without any perception of human presence, the robots adopt a wait-and-see policy and pause the exploration.

no. /
Comparison of Classical and Interactive Multi-Robot Exploration

Mixed Exploration
Mixed exploration enables to initiate interactions and also to reach frontiers.Thus, we combine the two target sets (frontiers and humans) to define a new set G (5). Let: This approach requires to smartly adjust interaction and frontier assignments to overcome the two above-mentioned issues (wait-and-see policy and the congested frontier).

Mixed Cost Model
In this study, robots can interact only by following pedestrians.The optimisation criterion is to explore a possibly populated environment with minimum distance and time.Thus, we define mixed costs using distances and weighted penalties, see Fig. 5.The weight σ balances interaction and frontier penalties.First we detail distances, then we introduce penalties and explain the different weights used in the cost formula.

Distance
First, we incorporate distances between robots and targets as immediate costs (Fig. 5a).Thus, we initialise C RG with normalised robot-frontier and robot-human distances (D RF , D RH ) in (6).
Distance costs have multiple drawbacks.Two examples follow: If a robot travels towards a frontier but a crowd hinders its navigation: the robot cannot adapt the exploration depending on navigation feasibility.Remote but reachable frontiers are not reevaluated as good options.The distance cost is prohibitive and the next target is always chosen between the last frontier and close humans who are nearby.A solution is to use a planned distance, which is set to infinity when a target is momentarily unreachable.
If a robot follows a pedestrian walking nearby but the person stops to discuss with other people: the robot cannot decide either to maintain or stop the current interaction depending on the human activity.Due to distances, the robot will resume exploration only if one person moves again.This also causes a growing unease for the people.A solution is to make an a priori evaluation of an interaction and to update the evaluation of the interaction a posteriori, while it is taking place.

Penalty
We tackle these two drawbacks with a heuristic that associates penalties to each robot-frontier/human pair (Fig. 5b).
A penalty p RiXj is defined as the sum of a time penalty and an orientation penalty.The time penalty t RiXj is the time elapsed since a frontier discovery or a human remains idle.The orientation penalty o RiXj is the smallest unsigned angle between the orientation of a robot and the orientation of a frontier/human (a frontier is oriented towards the unknown).Thus, we define P RG with normalised robot-frontier and robot-human penalties (P RF , P RH ) in (7).
Parameter σ sets more or less weight on the frontier penalties or on the interaction penalties.When this parameter is high, it increases the frontier costs and decreases the interaction costs.This results in favouring interactions over frontiers.

Distance and Penalty
The mixed cost matrix C RG which incorporates distances D RG and penalties P RG is represented in (8).Parameter α modulates the immediate distance cost and the information coming from the penalty heuristic.When α is high (resp.low), the importance of the penalties is reduced (resp.increased).Distances and penalties are counterbalanced with α, while σ sets more or less focus on frontiers or on interactions.We present the influence of α and σ on the cost formula in (Fig. 6).The values range from 0 to 1 for each parameter and the formula written on each side is obtained when one parameter is set to its extreme value.
We have adopted a mixed approach, and have defined a parametric cost matrix based on a penalty heuristic.Now, we evaluate the exploration performance of this heuristic for two greedy optimisation methods, assuming different values of α and σ.

Experimental Framework
We use the V-REP robotic simulator for our experiments [19].The environment is discretised with 0.5 m square cells.The robots share their exploration map, so the frontiers that are discovered are known by every robot.Contiguous frontier cells are grouped together into a frontier area.Inside a frontier area, the targeted cell minimises the sum of distances to the other cells.
Assignments are locally computed by each robot.To optimise its assignment, it takes into account the entire set of frontiers known until now, but only the robots and pedestrians perceived locally (within a 2 m radius).Planning is done using a potential field propagated on the grid.

Protocol
The parameters are as follows: • Map: Three environments are considered.The first contains no obstacle (empty), the second has unstructured obstacles (unstructured) and the third environment is composed of a corridor and three rooms (structured).Maps are shown in (Fig. 7).• Population density: Environments are human populated with 0 or 30% of occupation.Each human agent moves in a straight line and avoid obstacles by stopping and rotating.
• Number of robots: Two explorers are used for each experiment, they are represented as cylinders.
• Optimisation method: We use two different cost optimisation methods: The first one is a local greedy method, where each robot chooses the minimum cost target among only its own possible targets as in [11] for distances.
The second one is a group greedy method, where for all locally visible robots at each time step, the robot-target assignment with minimum cost is recursively discarded until the local robot is assigned.
• Modulators: α and σ are discretised from 0 to 1 with a step of 0.25.

Metrics
Each scenario is evaluated with classical MRE metrics: coverage, distance and time.In addition, we use a common metric in HRI, called the Robotic Attention Demand (RAD), which measures the autonomy of a robot during its task [20,21].Here we consider the number of interactions initiated during exploration.

Results
First, let us consider environments without humans.
We study the influence of α by fixing σ to 1.This allows to only adjust the distance and frontiers penalty.This is legitimate, since no human implies no interaction penalty.The performances averaged over 10 runs are plotted in Fig. 8 for local greedy, and Fig. 9 for group greedy.
For local greedy in Fig. 8, regarding the empty and structured maps, we distinguish two steps.In the first step, distance and time increase, and in the second step they both decrease until α = 1.The unstructured map for local greedy and all maps for group greedy (Fig. 9) present only one step where distance and time decrease when increasing α.In these cases, when α is high, penalties fade and robots do less round trips between remote frontiers in the scene.Thus, in nonpopulated environments, our heuristic does not give better performances.Now we consider the presence of pedestrians.The maps are populated at 30 % up to 1 human/m 2 , thus enabling human-robot interactions.Figs. 10 and 11 give the mean performances of local greedy and group greedy, respectively for 10 runs of each (α,σ) combination.When σ increases, the penalty of the interactions is reduced, favouring interactions to the detriment of frontiers.
For the empty case (Fig. 10), full coverage with the shortest distance and time are at (α, σ) = (0, 0).Only penalties are used, and frontiers are preferred over interactions.No interaction was initiated (RAD), but an average of 28 frontiers were assigned.In the unstructured case, the best average performance is at (0.75,0).Distances are overweighted compared to penalties, and interactions are heavily penalised.Nevertheless, an average of 18 interactions (RAD) were initiated against 26 frontiers assignments.In the structured environment, the best performances and lowest standard deviations are at (0.5,0).Distances and penalties are equally weighted, and again interactions are penalised.An average of 31 frontier assignments and 18 interaction assignments took place.
Here, interactions are interesting, because a robot can discover the corridor by following someone.For group greedy (Fig. 11), the best performance is located at (0.25, 0) for the empty scene.Penalties have more weight than distances; frontiers are preferred over interactions.An average of 16 frontier assignments and 7 interaction assignments is noticed.The unstructured environment has maximum coverage, with minimum travelled distance and time at (0.5,0).The average number of frontiers assigned is 8 times the number of interactions (only 4).For the last map, the best average performance is at (0.25,0) for a frontier/interaction ratio of 29/4.With these new results, the distance does not suffice for choosing the best targets with human presence.Instead, a smart equilibrium with our penalty heuristic always gives the best performance (α = 1).Here with (σ = 0), the frontiers were chosen considering only distances, but interactions were chosen carefully by adding heavy penalties.Thus our heuristic is already sufficient for selecting interactions only if necessary, but also it is not yet able to promote them.

Conclusion
In this paper, we have defined the interaction-based exploration by targeting the humans perceived by the robots.Interactive exploration paves the way for exploiting human natural heuristics, for a better understanding of the dynamics of a populated envi-ronment.The mixed approach, based upon frontier and interactive exploration, aims at bringing out the best of both approaches.For this purpose, we designed a parametric heuristic to equilibrate frontiers and interactions (pedestrian following) assignments.This heuristic considers penalties for the idle state of the targets (frontier, human) and their orientation.
We have shown in simulation that, in some cases, incorporating an interactive aspect into exploration can be beneficial, even with this simplistic heuristic.To enable efficient dynamic exploration, it is therefore paramount to discover these particular cases.In this sense, machine learning and online tuning of weights might be of interest for achieving a robotic heuristic adaptation.This work opens up prospects for exploiting human adaptiveness in robotic exploration of populated environments.

Figure 4 .
Figure 4. Multi-Robot Exploration as a Task Allocation Problem.

Figure 5 .
Figure 5. Distances and penalties considered in the system.